{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# 2A.i - ProgressBar et fusion de random forest - \u00e9nonc\u00e9\n", "\n", "L'id\u00e9e de ce notebook n'est pas de se servir de faire du machine learning mais de modifier la fonction *fit* pour afficher une barre d'avancement dans le notebook. Lorsque les affichages (print) sont trop nombreux et prennent tout l'\u00e9cran, une barre de d\u00e9filement est une solution pratique et efficace. On applique cela \u00e0 un assemblage de random forest."]}, {"cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [{"data": {"text/html": ["
run previous cell, wait for 2 seconds
\n", ""], "text/plain": [""]}, "execution_count": 2, "metadata": {}, "output_type": "execute_result"}], "source": ["from jyquickhelper import add_notebook_menu\n", "add_notebook_menu()"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Module tqdm\n", "\n", "Le module [tqdm](https://pypi.python.org/pypi/tqdm) permet d'afficher le progr\u00e8s d'un processus assez long. Quelques exemples issus de la documentation."]}, {"cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [{"name": "stderr", "output_type": "stream", "text": ["Processing d: 100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 4/4 [00:00<00:00, 7989.15it/s]\n"]}], "source": ["from tqdm import tqdm\n", "pbar = tqdm([\"a\", \"b\", \"c\", \"d\"])\n", "for char in pbar:\n", " pbar.set_description(\"Processing %s\" % char)"]}, {"cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [{"data": {"application/vnd.jupyter.widget-view+json": {"model_id": "6416ba6864734c1d800dc91b8b04fdd1"}}, "metadata": {}, "output_type": "display_data"}, {"name": "stdout", "output_type": "stream", "text": ["\n"]}], "source": ["from tqdm import tnrange\n", "from random import random, randint\n", "from time import sleep\n", "\n", "t = tnrange(100)\n", "for i in t:\n", " # Description will be displayed on the left\n", " t.set_description('GEN %i' % i)\n", " # Postfix will be displayed on the right, and will format automatically\n", " # based on argument's datatype\n", " t.set_postfix(loss=random(), gen=randint(1,999), str='h', lst=[1, 2])\n", " sleep(0.1)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## RandomForest\n", "\n", "L'article [Combining random forest models in scikit learn](http://stackoverflow.com/questions/28489667/combining-random-forest-models-in-scikit-learn) explique comment fusionner des random forest. L'objectif est d'en apprendre 10 sur n'importe quel jeu de donn\u00e9es en affichant une barre d'avancement. A quoi sert le param\u00e8tre [warm_start](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier) ? Peut-on imaginer tracer la d\u00e9croissance du taux d'erreur en fonction du nombre d'arbres ?"]}, {"cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": ["from sklearn.datasets import make_classification\n", "from sklearn.model_selection import train_test_split\n", "X, y = make_classification(n_samples=500, n_features=25,\n", " n_clusters_per_class=1, n_informative=15)\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5)"]}, {"cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [{"data": {"text/plain": ["RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',\n", " max_depth=None, max_features='auto', max_leaf_nodes=None,\n", " min_impurity_decrease=0.0, min_impurity_split=None,\n", " min_samples_leaf=1, min_samples_split=2,\n", " min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,\n", " oob_score=False, random_state=None, verbose=0,\n", " warm_start=False)"]}, "execution_count": 6, "metadata": {}, "output_type": "execute_result"}], "source": ["from sklearn.ensemble import RandomForestClassifier\n", "model = RandomForestClassifier()\n", "model.fit(X_train, y_train)"]}, {"cell_type": "code", "execution_count": 6, "metadata": {"collapsed": true}, "outputs": [], "source": []}], "metadata": {"kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.1"}}, "nbformat": 4, "nbformat_minor": 2}