{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# 2A.i - ProgressBar et fusion de random forest - \u00e9nonc\u00e9\n", "\n", "L'id\u00e9e de ce notebook n'est pas de se servir de faire du machine learning mais de modifier la fonction *fit* pour afficher une barre d'avancement dans le notebook. Lorsque les affichages (print) sont trop nombreux et prennent tout l'\u00e9cran, une barre de d\u00e9filement est une solution pratique et efficace. On applique cela \u00e0 un assemblage de random forest."]}, {"cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [{"data": {"text/html": ["<div id=\"my_id_menu_nb\">run previous cell, wait for 2 seconds</div>\n", "<script>\n", "function repeat_indent_string(n){\n", "    var a = \"\" ;\n", "    for ( ; n > 0 ; --n) {\n", "        a += \"    \";\n", "    }\n", "    return a;\n", "}\n", "var update_menu_string = function(begin, lfirst, llast, sformat, send, keep_item) {\n", "    var anchors = document.getElementsByClassName(\"section\");\n", "    if (anchors.length == 0) {\n", "        anchors = document.getElementsByClassName(\"text_cell_render rendered_html\");\n", "    }\n", "    var i,t;\n", "    var text_menu = begin;\n", "    var text_memo = \"<pre>\\nlength:\" + anchors.length + \"\\n\";\n", "    var ind = \"\";\n", "    var memo_level = 1;\n", "    var href;\n", "    var tags = [];\n", "    var main_item = 0;\n", "    for (i = 0; i <= llast; i++) {\n", "        tags.push(\"h\" + i);\n", "    }\n", "\n", "    for (i = 0; i < anchors.length; i++) {\n", "        text_memo += \"**\" + anchors[i].id + \"--\\n\";\n", "\n", "        var child = null;\n", "        for(t = 0; t < tags.length; t++) {\n", "            var r = anchors[i].getElementsByTagName(tags[t]);\n", "            if (r.length > 0) {\n", "child = r[0];\n", "break;\n", "            }\n", "        }\n", "        if (child == null){\n", "            text_memo += \"null\\n\";\n", "            continue;\n", "        }\n", "        if (anchors[i].hasAttribute(\"id\")) {\n", "            // when converted in RST\n", "            href = anchors[i].id;\n", "            text_memo += \"#1-\" + href;\n", "            // passer \u00e0 child suivant (le chercher)\n", "        }\n", "        else if (child.hasAttribute(\"id\")) {\n", "            // in a notebook\n", "            href = child.id;\n", "            text_memo += \"#2-\" + href;\n", "        }\n", "        else {\n", "            text_memo += \"#3-\" + \"*\" + \"\\n\";\n", "            continue;\n", "        }\n", "        var title = child.textContent;\n", "        var level = parseInt(child.tagName.substring(1,2));\n", "\n", "        text_memo += \"--\" + level + \"?\" + lfirst + \"--\" + title + \"\\n\";\n", "\n", "        if ((level < lfirst) || (level > llast)) {\n", "            continue ;\n", "        }\n", "        if (title.endsWith('\u00b6')) {\n", "            title = title.substring(0,title.length-1).replace(\"<\", \"&lt;\").replace(\">\", \"&gt;\").replace(\"&\", \"&amp;\")\n", "        }\n", "\n", "        if (title.length == 0) {\n", "            continue;\n", "        }\n", "\n", "        while (level < memo_level) {\n", "            text_menu += \"</ul>\\n\";\n", "            memo_level -= 1;\n", "        }\n", "        if (level == lfirst) {\n", "            main_item += 1;\n", "        }\n", "        if (keep_item != -1 && main_item != keep_item + 1) {\n", "            // alert(main_item + \" - \" + level + \" - \" + keep_item);\n", "            continue;\n", "        }\n", "        while (level > memo_level) {\n", "            text_menu += \"<ul>\\n\";\n", "            memo_level += 1;\n", "        }\n", "        text_menu += repeat_indent_string(level-2) + sformat.replace(\"__HREF__\", href).replace(\"__TITLE__\", title);\n", "    }\n", "    while (1 < memo_level) {\n", "        text_menu += \"</ul>\\n\";\n", "        memo_level -= 1;\n", "    }\n", "    text_menu += send;\n", "    //text_menu += \"\\n\" + text_memo;\n", "    return text_menu;\n", "};\n", "var update_menu = function() {\n", "    var sbegin = \"\";\n", "    var sformat = '<li><a href=\"#__HREF__\">__TITLE__</a></li>';\n", "    var send = \"\";\n", "    var keep_item = -1;\n", "    var text_menu = update_menu_string(sbegin, 2, 4, sformat, send, keep_item);\n", "    var menu = document.getElementById(\"my_id_menu_nb\");\n", "    menu.innerHTML=text_menu;\n", "};\n", "window.setTimeout(update_menu,2000);\n", "            </script>"], "text/plain": ["<IPython.core.display.HTML object>"]}, "execution_count": 2, "metadata": {}, "output_type": "execute_result"}], "source": ["from jyquickhelper import add_notebook_menu\n", "add_notebook_menu()"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Module tqdm\n", "\n", "Le module [tqdm](https://pypi.python.org/pypi/tqdm) permet d'afficher le progr\u00e8s d'un processus assez long. Quelques exemples issus de la documentation."]}, {"cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [{"name": "stderr", "output_type": "stream", "text": ["Processing d: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 4/4 [00:00<00:00, 7989.15it/s]\n"]}], "source": ["from tqdm import tqdm\n", "pbar = tqdm([\"a\", \"b\", \"c\", \"d\"])\n", "for char in pbar:\n", "    pbar.set_description(\"Processing %s\" % char)"]}, {"cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [{"data": {"application/vnd.jupyter.widget-view+json": {"model_id": "6416ba6864734c1d800dc91b8b04fdd1"}}, "metadata": {}, "output_type": "display_data"}, {"name": "stdout", "output_type": "stream", "text": ["\n"]}], "source": ["from tqdm import tnrange\n", "from random import random, randint\n", "from time import sleep\n", "\n", "t = tnrange(100)\n", "for i in t:\n", "    # Description will be displayed on the left\n", "    t.set_description('GEN %i' % i)\n", "    # Postfix will be displayed on the right, and will format automatically\n", "    # based on argument's datatype\n", "    t.set_postfix(loss=random(), gen=randint(1,999), str='h', lst=[1, 2])\n", "    sleep(0.1)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## RandomForest\n", "\n", "L'article [Combining random forest models in scikit learn](http://stackoverflow.com/questions/28489667/combining-random-forest-models-in-scikit-learn) explique comment fusionner des random forest. L'objectif est d'en apprendre 10 sur n'importe quel jeu de donn\u00e9es en affichant une barre d'avancement. A quoi sert le param\u00e8tre [warm_start](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier) ? Peut-on imaginer tracer la d\u00e9croissance du taux d'erreur en fonction du nombre d'arbres ?"]}, {"cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": ["from sklearn.datasets import make_classification\n", "from sklearn.model_selection import train_test_split\n", "X, y = make_classification(n_samples=500, n_features=25,\n", "                           n_clusters_per_class=1, n_informative=15)\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5)"]}, {"cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [{"data": {"text/plain": ["RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',\n", "            max_depth=None, max_features='auto', max_leaf_nodes=None,\n", "            min_impurity_decrease=0.0, min_impurity_split=None,\n", "            min_samples_leaf=1, min_samples_split=2,\n", "            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,\n", "            oob_score=False, random_state=None, verbose=0,\n", "            warm_start=False)"]}, "execution_count": 6, "metadata": {}, "output_type": "execute_result"}], "source": ["from sklearn.ensemble import RandomForestClassifier\n", "model = RandomForestClassifier()\n", "model.fit(X_train, y_train)"]}, {"cell_type": "code", "execution_count": 6, "metadata": {"collapsed": true}, "outputs": [], "source": []}], "metadata": {"kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.1"}}, "nbformat": 4, "nbformat_minor": 2}