{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# Courte introduction au machine learning\n", "\n", "Le jeu de donn\u00e9es [Wine Quality Data Set](https://archive.ics.uci.edu/ml/datasets/Wine+Quality) recense les composants chimiques de vins ainsi que la note d'experts. Peut-on pr\u00e9dire cette note \u00e0 partir des composants chimiques ? Peut-\u00eatre que si on arrive \u00e0 construire une fonction qui permet de pr\u00e9dire cette note, on pourra comprendre comment l'expert note les vins."]}, {"cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [{"data": {"text/html": ["
run previous cell, wait for 2 seconds
\n", ""], "text/plain": [""]}, "execution_count": 2, "metadata": {}, "output_type": "execute_result"}], "source": ["from jyquickhelper import add_notebook_menu\n", "add_notebook_menu()"]}, {"cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": ["%matplotlib inline"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Donn\u00e9es et premi\u00e8re r\u00e9gression lin\u00e9aire\n", "\n", "On peut utiliser la fonction impl\u00e9ment\u00e9e dans ce module."]}, {"cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [{"data": {"text/html": ["
"]}, "metadata": {"needs_background": "light"}, "output_type": "display_data"}], "source": ["df.plot(x=\"minl\", y=[\"r2_train_dt\", \"r2_test_dt\",\n", " \"r2_train_reg\", \"r2_test_reg\"]);"]}, {"cell_type": "markdown", "metadata": {}, "source": ["On voit que la performance sur la base de test augmente rapidement puis stagne sans jamais rattraper celle de la base d'apprentissage. Elle ne d\u00e9passe pas celle d'un mod\u00e8le lin\u00e9aire ce qui est d\u00e9cevant. Essayons avec une for\u00eat al\u00e9atoire."]}, {"cell_type": "markdown", "metadata": {}, "source": ["## For\u00eat al\u00e9atoire"]}, {"cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [{"name": "stderr", "output_type": "stream", "text": ["100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 25/25 [00:20<00:00, 1.54it/s]\n"]}, {"data": {"text/html": ["
\n", "\n", "
\n", " \n", "
\n", "
\n", "
minl
\n", "
r2_train_dt
\n", "
r2_test_dt
\n", "
r2_train_reg
\n", "
r2_test_reg
\n", "
r2_train_rf
\n", "
r2_test_rf
\n", "
\n", " \n", " \n", "
\n", "
0
\n", "
1
\n", "
1.000000
\n", "
0.030211
\n", "
0.30522
\n", "
0.265853
\n", "
0.920318
\n", "
0.472317
\n", "
\n", "
\n", "
1
\n", "
3
\n", "
0.864086
\n", "
0.130133
\n", "
0.30522
\n", "
0.265853
\n", "
0.836299
\n", "
0.455444
\n", "
\n", " \n", "
\n", "
"], "text/plain": [" minl r2_train_dt r2_test_dt r2_train_reg r2_test_reg r2_train_rf \\\n", "0 1 1.000000 0.030211 0.30522 0.265853 0.920318 \n", "1 3 0.864086 0.130133 0.30522 0.265853 0.836299 \n", "\n", " r2_test_rf \n", "0 0.472317 \n", "1 0.455444 "]}, "execution_count": 23, "metadata": {}, "output_type": "execute_result"}], "source": ["import pandas\n", "from sklearn.ensemble import RandomForestRegressor\n", "from tqdm import tqdm\n", "res = []\n", "for i in tqdm(range(1, 50, 2)):\n", " dt = DecisionTreeRegressor(min_samples_leaf=i)\n", " reg = LinearRegression()\n", " rf = RandomForestRegressor(n_estimators=25, min_samples_leaf=i)\n", " dt.fit(X_train, y_train)\n", " reg.fit(X_train, y_train)\n", " rf.fit(X_train, y_train)\n", " r = {\n", " 'minl': i,\n", " 'r2_train_dt': r2_score(y_train, dt.predict(X_train)),\n", " 'r2_test_dt': r2_score(y_test, dt.predict(X_test)),\n", " 'r2_train_reg': r2_score(y_train, reg.predict(X_train)),\n", " 'r2_test_reg': r2_score(y_test, reg.predict(X_test)),\n", " 'r2_train_rf': r2_score(y_train, rf.predict(X_train)),\n", " 'r2_test_rf': r2_score(y_test, rf.predict(X_test)),\n", " }\n", " res.append(r)\n", "df = pandas.DataFrame(res)\n", "df.head(2)"]}, {"cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [{"data": {"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAEGCAYAAAB1iW6ZAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAgAElEQVR4nOzdeXxU5b348c8zk9ky2XdIIISdsCQgAmVTQBFbS9WKS1tbrlraKvb2tvVWbetFqr0u7a/WqnWtepWqRa3iihugAUFAAlH2ACGBkH3fZnt+f5xJSCCQAJNMlu/79Tqvs8yZM98TyHeePOc536O01gghhOj9TMEOQAghRGBIQhdCiD5CEroQQvQRktCFEKKPkIQuhBB9REiwPjguLk4PGTIkWB8vhBC90tatW0u11vHtvRa0hD5kyBC2bNkSrI8XQoheSSmVd6rXpMtFCCH6CEnoQgjRR0hCF0KIPiJofehCiJ7N7XZTUFBAY2NjsEPpl+x2OykpKVgslk6/RxK6EKJdBQUFhIeHM2TIEJRSwQ6nX9FaU1ZWRkFBAWlpaZ1+X4ddLkqpfyilipVSX53idaWUelgptV8ptUMpNekM4hZC9FCNjY3ExsZKMg8CpRSxsbFn/NdRZ/rQnwMWnOb1S4ER/mkJ8PczikAI0WNJMg+es/nZd5jQtdafAuWn2eU7wP9pw0YgSik14Iwj6aScgiruf383UvZXCCHaCsQol2Qgv9V6gX/bSZRSS5RSW5RSW0pKSs7qw7blV/D3tblsyas4q/cLIURfFYiE3t7fBe02n7XWT2qtJ2utJ8fHt3vnaocWnTeI6FALT6zLPav3CyF6p/r6er71rW8xevRoxo4dy+23337a/d944w127tx5xp+zatUq7rvvvrMNs8XixYt59dVXAXjooYeor68/52N2JBAJvQAY1Go9BTgagOO2y2E188NvDOGjXcXsL67pqo8RQvQwWmt++ctfsnv3brZt28b69et57733Trn/6RK6x+M55fsWLlzY4ZfFmequhB6IYYurgKVKqZeBqUCV1rowAMc9pR9+I5XH1+Xy1KcHuf+qCV35UUII4O63vmbn0eqAHjN9YAT/8+2xp93n0KFDXHrppcyZM4fPP/+cN954AwCr1cqkSZMoKCho930bNmxg1apVrFu3jnvuuYfXXnuNG2+8kenTp7N+/XoWLlzIyJEjueeee3C5XMTGxrJixQoSExN57rnn2LJlC4888giLFy8mIiKCLVu2cOzYMR544AGuuuqqdj9Ta82tt97KJ598QlpaWst1vocffpijR48yZ84c4uLiWLNmzTn81E6vM8MWXwI+B0YppQqUUjcqpX6qlPqpf5d3gQPAfuAp4OYui9YvNszGoskp/HvbEYqr5aYHIfqyPXv28MMf/pBt27aRmpoKQGVlJW+99Rbz5s1r9z3Tp09n4cKFPPjgg2RnZzNs2LCW961bt45f/epXzJw5k40bN7Jt2zauvfZaHnjggXaPVVhYSFZWFm+//fZpW+7//ve/2bNnDzk5OTz11FNs2LABgJ///OcMHDiQNWvWdGkyh0600LXW13XwugZuCVhEnXTTzKGs2HSY5zYc4r8XjO7ujxeiX+moJd2VUlNTmTZtWsu6x+Phuuuu4+c//zlDhw49o2Ndc801LcsFBQVcc801FBYW4nK5TnkDz+WXX47JZCI9PZ2ioqJTHvvTTz/luuuuw2w2M3DgQObOnXtGsQVCr63lMiTOyYKxSby4MY/aplP3hwkhejen09lmfcmSJYwYMYJf/OIX53SsW2+9laVLl5KTk8MTTzxxypt4bDZby3JHw6WDPW6/1yZ0gCWzh1Ld6OHlLw4HOxQhRDf43e9+R1VVFQ899FCH+4aHh1NTc+qBE1VVVSQnGyOsn3/++XOObfbs2bz88st4vV4KCwvbdK90FEug9OqEPnFwNFPSYvhH1kHcXl+wwxFCdKGCggLuvfdedu7cyaRJk8jMzOTpp58+5f7XXnstDz74IBMnTiQ39+RhzsuWLWPRokXMmjWLuLi4c47viiuuYMSIEYwfP56f/exnXHDBBS2vLVmypOXibldSwbrjcvLkyToQTyz6eFcRNz6/hYeuyeTyie3ezySEOAu7du1izJgxwQ6jX2vv30AptVVrPbm9/Xt1Cx1gzqgEhieE8cSnB6QcgBCiX+v15XNNJsWSWUP579d2kLW/lFkjzu4OVCFE73TvvfeycuXKNtsWLVrEb3/72y75vJycHK6//vo222w2G5s2beqSzzsTvb7LBaDJ42XW/WsYmRjOizdNDcgxhejvpMsl+PpdlwuALcTMf8xII2t/KV8dqQp2OEIIERR9IqEDfG/qYJxWM099diDYoQghRFD0uoTudnk5tKP0pO2RDgvXTRnM2zsKKajo+iI4QgjR0/S6hL713UO8+/cdHNlzcj30G2amoYB/ZB3q9riEECLYel1Cn7QglciEUD74x9c01LjavDYwysG3Mwby8ubDVNW7gxShEKIrdFc9dIDs7GzefffdM3rPkCFDKC0tpbKykscee+ysPvdc9bqEbrWHMP+msTTVefjouV1oX9tROktmD6Xe5eXFTXlBilAI0RUCWQ+9I2eT0JsFM6H3ynHo8YPCmbloOOte2su2jw4zaX5qy2tjBkQwe2Q8z64/xI0z07BbzEGMVIg+4r3b4VhOYI+ZNB4uPf2TgQJZDx3glltuoaSkhNDQUJ566ilGjx7NypUrufvuuzGbzURGRvLRRx9x11130dDQQFZWFnfccUebKo3NysrKuO666ygpKWHKlCktNzbefvvt5ObmkpmZycUXX8yDDz54Lj+lM9LrWujNxs5OZtjEeDa9cYBjB9oOVfzJ7KGU1jbxxrYjQYpOCBEogaqHvmTJEv72t7+xdetW/vSnP3HzzcajG5YvX87q1avZvn07q1atwmq1snz5cq655hqys7PbTeYAd999NzNnzmTbtm0sXLiQw4eNIoH33Xcfw4YNIzs7u1uTOfTSFjoYZSrnXD+a4sOb+eDpr7n6t+djd1oAmD4slrEDI3jyswNcPXkQJlNwS1oK0et10JLuSoGoh15bW8uGDRtYtGhRy7ampiYAZsyYweLFi7n66qu58sorOx3Xp59+yuuvvw7At771LaKjozv93q7Sa1voALZQC5fcNI66yibWvLC75U8epRRLZg/lQEkdH+8uDnKUQohzEYh66D6fj6ioKLKzs1umXbt2AfD4449zzz33kJ+fT2ZmJmVlZZ0+brDrn5+oVyd0gMS0CKZdMYwD2SV8te54F8u3xg8gOcrBk5+eXDZTCNE7nW099IiICNLS0lpqvmit2b59OwC5ublMnTqV5cuXExcXR35+fqfql8+ePZsVK1YA8N5771FRUXHS53a3Xp/QATLnDSJ1fCxZr+6j5LDxgwwxm7hpVhqbD1WwNe/kMetCiN7lXOuhr1ixgmeeeYaMjAzGjh3Lm2++CcBtt93G+PHjGTduHLNnzyYjI4M5c+awc+dOMjMzeeWVV9o9/v/8z//w6aefMmnSJD744AMGDx4MQGxsLDNmzGDcuHHcdtttgf9BnEafKM4F0FDr4pV7NhNiNXH1nedjtYdQ1+Rh+n2fMG1oDE9c324tGyHEKUhxruDrl8W5ABxhVubfmE51SQPr/rkHrTVOWwjXT0vlg51FHCipDXaIQgjRpfpMQgcYOCKa8y9LY+8XRez+vBCAH00fgsVs4u9rpS9diL7o3nvvJTMzs8107733Buz4zz777EnHv+WWWwJ2/EDqM10uzXw+zaq/ZlN0oIpFd5xPzEAn976zk6c+O8gLN06RB2AI0UnS5RJ8/bbLpZnJpLj4hnQsdjOrn/4Kt8vLr+aPYli8k9tW7qCqQWq8CCH6pj6X0AGckTYu+o90yo/WkfWvfdgtZv5yTSYltU3cverrYIcnhBBdok8mdIDB6bFMWpDKzqyj7NtcxISUKJbOGc7r247w/leFwQ5PCCECrs8mdICp304jaWgka1bsprK4nqVzhzM+OZI7//0VJTVNwQ5PCCECqk8ndJPZxPybxmIyKVY/9RV4Nf/v6gxqmzzc8foOgnVBWAhx5rqrHvqqVau4777g1a45F306oQOEx9i56D/SKS2oZd2KPQxPCOO/LxnFR7uKWbm1/dKbQoieJ5D10D0ezynft3Dhwg6/LDri9XrP6f1nq9dWWzwTQ8bHMeWyNL546yAJQyK44YI0PtxZxPK3djJ9WCwp0aHBDlGIHu3+L+5nd/nugB5zdMxofjPlN6fdJ5D10G+88UamT5/O+vXrWbhwISNHjuSee+7B5XIRGxvLihUrSExM5LnnnmPLli088sgjLF68mIiICLZs2cKxY8d44IEHuOqqq9r9zLVr13L33XczYMAAsrOz2blzJy+++CIPP/wwLpeLqVOn8thjj2E2m3nmmWe4//77GThwICNGjMBms/HII4+c2w+UftBCbzb50iEMmRDH+pX7OHagij8tykBrza9Xbsfnk64XIXqqQNVDb37funXr+NWvfsXMmTPZuHEj27Zt49prr+WBBx5o91iFhYVkZWXx9ttvd9hy/+KLL1rqzezatYtXXnmF9evXk52djdlsZsWKFRw9epQ//OEPbNy4kQ8//JDduwP3RdkvWugAyqS46D/SWfm/m1n95Fdcfef53PXtdH7zWg7PbTjEDTPTgh2iED1WRy3prhSIeujNWj+soqCggGuuuYbCwkJcLhdpae3ngMsvvxyTyUR6ejpFRUWnPf6UKVNajvPxxx+zdetWzj//fAAaGhpISEjgiy++4IILLiAmJgaARYsWsXfv3jM6j1PpVAtdKbVAKbVHKbVfKXXSV5RSarBSao1SaptSaodS6psBiS7AbI4QLv3JeFxNXt5/MofvZiYzb3QC97+/m/3FUutFiJ4oEPXQ2zvWrbfeytKlS8nJyeGJJ56gsbGx3ffYbLaW5Y4GUrQ+vtaaH/3oRy311/fs2cOyZcu6dDBGhwldKWUGHgUuBdKB65RS6Sfs9jvgX1rricC1QHCekNoJsclhzL1+NMcOVLP+1f3873fHE2o188t/ZeP2+oIdnhDiNM62Hnp7qqqqSE5OBuD5558PWIzN5s2bx6uvvkpxsfGQnfLycvLy8pgyZQrr1q2joqICj8fT8rzTQOhMC30KsF9rfUBr7QJeBr5zwj4aiPAvRwJHAxZhFxgxOZHMiwfz1bojlOVUcO8V49lRUMVja6SAlxA91bnWQz/RsmXLWLRoEbNmzSIuLi7g8aanp3PPPfcwf/58JkyYwMUXX0xhYSHJycnceeedTJ06lYsuuoj09HQiIyMD8pkdFudSSl0FLNBa3+Rfvx6YqrVe2mqfAcAHQDTgBC7SWm9t51hLgCUAgwcPPi8vLy8gJ3E2fF4fqx7ezrHcKq68bRJ//DyXt3cU8u+bZzA+JTA/XCF6MynO1XVqa2sJCwvD4/FwxRVXcMMNN3DFFVectF9XFOdq76F5J34LXAc8p7VOAb4JvKCUOunYWusntdaTtdaT4+ODW/XQZDZxyU1jcYRbeO+JHO6YO4q4MBv/9a9sGt3BGUMqhOgfli1bRmZmJuPGjSMtLY3LL788IMftzCiXAmBQq/UUTu5SuRFYAKC1/lwpZQfigB79hGZHuJVLfzqe1x/8ko3/3MP9V47nR89t5k+r9/C7y068TCCE6InuvffelmeFNlu0aBG//e1vu+TzcnJyuP7669tss9lsbNq0qdPH+NOf/hTosIDOdbmEAHuBecARYDPwPa311632eQ94RWv9nFJqDPAxkKxPc/Cuqod+NnauP8qaF3Yzcf5g3jU1sGLTYV768TSmDY0NdmhCBI10uQRfwLtctNYeYCmwGtiFMZrla6XUcqXUQv9uvwJ+rJTaDrwELD5dMu9p0mcMZOysgWz74DDXDYhjcEwov165nZpGqZ0uhOg9OjUOXWv9rtZ6pNZ6mNb6Xv+2u7TWq/zLO7XWM7TWGVrrTK31B10ZdFeYdfVIEtMiyPrnXu6ZO4qjlQ38/KVtMpRRCNFr9Jtb/ztitphYsGQ8FquJvDfz+MM301mzp4TbX8uRqoxCiF5BEnorYdE2FiwZR3VpI1E5NfzXvBG89mUB97+/J9ihCSFEhyShn2DgiGhmfHc4h3aUMv6Yl++fP4jH1+XyTNbBYIcmRL/WXfXQAbKzs3n33XfP6r3BJAm9HRPmpnDeglR2rS9k6hEf3xyVyB/e3smb2UeCHZoQ/VYg66F35EwT+unqq3enflNt8UwopZh2+TAi4hys/eceZiWFUpMcza9XbifGaWXWiODeFCVEdzv2xz/StCuw9dBtY0aTdOedp90nkPXQAW655RZKSkoIDQ3lqaeeYvTo0axcuZK7774bs9lMZGQkH330EXfddRcNDQ1kZWVxxx13tKnS2GzZsmUcPXqUQ4cOERcXxwsvvMDtt9/O2rVraWpq4pZbbuEnP/kJPp+PpUuXsm7dOtLS0vD5fNxwww2nrKt+LiShn0b6zIGERdt4/6mvmFNnpinayU9f2MrLS74h5QGE6CZ79uzh2Wef5bHHjtf8a66H/p//+Z/tvqe5Hvpll13WkjjnzZvH448/zogRI9i0aRM333wzn3zyCcuXL2f16tUkJydTWVmJ1Wpl+fLlLQ+5OJ2tW7eSlZWFw+HgySefJDIyks2bN9PU1MSMGTOYP38+W7du5dChQ+Tk5FBcXMyYMWO44YYbAvcDakUSegcGj43lyl+fxzuPbmfeUY0vysriZ7/g1Z9NJy3O2fEBhOgDOmpJd6VA1EOvra1lw4YNLFq0qGVbU5PxoPgZM2awePFirr76aq688sozim3hwoU4HA4APvjgA3bs2MGrr74KGNUc9+3bR1ZWFosWLcJkMpGUlMScOXPO6DPOhCT0TohLCeO7/z2Zdx7bztwjtXwWofjhPzbx2s+mkxBuD3Z4QvRpgaiH7vP5iIqKIjs7+6TXHn/8cTZt2sQ777xDZmZmu/t0JjatNX/729+45JJL2uzzzjvvdPp450ouinZSWLSNK341iUFjYplZaWJEoZfFz3whd5MK0Y3Oth56REQEaWlpLTVftNZs374dgNzcXKZOncry5cuJi4sjPz+/w1rq7bnkkkv4+9//jttt5IS9e/dSV1fHzJkzee211/D5fBQVFbF27dozOu6ZkIR+Bqz2EL5183jGzk7mvAYzww808dPnt9DkkeqMQnS1c62HvmLFCp555hkyMjIYO3Ysb775JgC33XYb48ePZ9y4ccyePZuMjAzmzJnDzp07yczM5JVXXulUfDfddBPp6elMmjSJcePG8ZOf/ASPx8N3v/tdUlJSWrZNnTo1YPXPT9Rhca6u0pOKc50prTXbPjzM56/nUmD2UjclmoeuPw+Tqb1Kw0L0TlKcK3Ca65+XlZUxZcoU1q9fT1JSUofvO9PiXNKHfhaUUkyan0pErIPVz3xF+aZK7rFs5/fXZaCUJHUhRFuXXXYZlZWVuFwufv/733cqmZ8NSejnYPh5CTijzuPVh76k8bMyHg35mqVXjwt2WEL0K11dD/3ZZ5/lr3/9a5ttM2bM4NFHH+30Mbqy37w16XIJgIpjdTx3/2ZMDV4a0sP55c3nYQ0xBzssIc6JdLkEX1c8gk50IDrJyY+XfQNfgp3wnbX8752fkX/szK6QCyHEuZKEHiBhkTZ+cfd0IqfHE13t5aU/fMHarPxghyWE6EckoQeQUoof/HA8U3+cjtekyHlxL888vg2fPCRDCNENJKF3gannDeCmu6dRFmuhMbuCv/w2i7KSumCHJYTo4yShd5H4mFB+/4dZNJ0XhanSzf8t28TWDVJ+V4iz1V310FetWsV99913tmG28fDDDzNmzBi+//3vB+R4HZFhi13IbFL88seTWDUmj60v72Pj/+0hN6eUK28YR4hFRsGI3uOzf+2lNL82oMeMGxTGrKtHdnr/5nro8+bNw+VyMW/ePN577z0uvfTSdvd/4403uOyyy0hPTz/pNY/HQ0hI++lv4cKFLFy4sNNxtcfr9WI2m3nsscd47733SEtLO6fjdZa00LvBwpmpfO+O89kbBSXbynjyrs8pOxrYXw4h+qJDhw4xZswYbr75ZmbOnMnw4cOBztdDv+2228jMzCQ3N5cLL7yQO++8kwsuuIC//vWvvPXWW0ydOpWJEydy0UUXUVRUBMBzzz3H0qVLAVi8eDE///nPmT59OkOHDm2ppNietWvXMmfOHL73ve8xfvx4fvrTn3LgwAEWLlzIX/7ylwD/ZNonLfRuMio5kj8sm8Wyx7eQtKeef97zBbOvGcmE2clyd6no8c6kJR1ogaqH3vy+devWAVBRUcHGjRtRSvH000/zwAMP8Oc///mkYxUWFpKVlcXu3btZuHDhaR9M8cUXX/DVV1+1tMjff/991qxZQ1xc3Fmd+5mShN6NIuwW/vTzaTzy7h7y388n66W97M8uYd61o4hKDA12eEL0SIGoh96s9ZOHCgoKuOaaaygsLMTlcp2yW+Tyyy/HZDKRnp7e0oo/lSlTpnRb90p7pMulm5lMip9fNpoFt0xgY5iX/N3lrFi2kTX/3E19tSvY4QnR4wSiHnp7x7r11ltZunQpOTk5PPHEEzQ2Nrb7HpvN1rLc0Z31J8ba3aSFHiRzxyQy4s4ZLH91B/qranyfHmX3xmNMviSVjHmDsNrln0aIEzXXQz9d2dxmHdU0r6qqIjk5GYDnn38+YDEGk7TQg2hQTChP/ngqVy0Zz1sDNbu0my/eOsgLv/+crz49glduSBKixbnWQz/RsmXLWLRoEbNmzeq2Pu6uJsW5eoi6Jg8PfbSX99bmMafJSpJLEZUYyjcuH0ZaZpxcOBXdTopzBZ8U5+qlnLYQfvutdJ7+1Qx2jbbzb2cTRyobeO+JHF5/8EsK91cGO0QhRA8nHbU9zOikCP710+m8urWA+97dxeAqzbwjNbz+py9Jy4jjG1cMIzopuBdehOhJuroe+olycnK4/vrr22yz2Wxs2rSpSz7vTEiXSw9WUefivvd28/rmfOZgJ7PeDF7N8PMSmDA3haS0rnkuoRAgXS49gTyCrg+Jdlq5/6oJLJqcwu/e+IqsozVc5YzAtKOUfZuLSEyLYMLcFIZNSsBslt4zIfo7yQK9wOQhMbx160x++e0xvOat48/2Wg4OtlJR0ciHz+zkhTs3sOXdgzTUyDh2IfqzTiV0pdQCpdQepdR+pVS7Jc6UUlcrpXYqpb5WSv0zsGEKi9nETbOGkvWbufzXpaPIool7dRVZAxSNTjObVh3k+Ts28PH/7aK0QJ6WJER/1GGXi1LKDDwKXAwUAJuVUqu01jtb7TMCuAOYobWuUEoldFXA/V2008rNFw7nx7OG8t5Xx3gm6yD35ZczOC6EK5yR7NtcxO4NhQwcEcWEuSmkZcRjMsmQRyH6g8600KcA+7XWB7TWLuBl4Dsn7PNj4FGtdQWA1ro4sGGKE1nMJhZmDOTNW2bw+s3TmZAezyM15TzsrKcozU5pcR3vP/EVL/7uc778IE/KCoher7vqoQNkZ2fz7rvvnvH7rrvuOiZMmNBt1RVP1JmLoslA64djFgBTT9hnJIBSaj1gBpZprd8/8UBKqSXAEoDBgwefTbyiHZMGRzPpe9EcqWzg/z4/xEubDlODh4sGhTPFo/j89Vw2vnGAwWNjGDU1ibSMOKnHLs7ImueepDjvQECPmZA6lDmLl3R6/0DWQ+9IdnY2W7Zs4Zvf/Gan9vd4PJSWlrJhwwby8vLO+PMCpTMt9Pb+Xj9xrGMIMAK4ELgOeFopFXXSm7R+Ums9WWs9OT4+/kxjFR1IjnJwx6Vj2HjnPJZfMY5cu497G8p5I8mHe7iTorwaPnj6a5797/WseXE3R/dXdlhsSIhgCmQ99NzcXBYsWMB5553HrFmz2L17NwArV65k3LhxZGRkMHv2bFwuF3fddRevvPIKmZmZvPLKK+1+xrJly1iyZAnz58/nhz/8IfPnz6e4uJjMzEw+++yzrvmBdKAzLfQCYFCr9RTgaDv7bNRau4GDSqk9GAl+c0CiFGck1BrC9dNS+f6UwazbV8LzGw7x0N4STCb49pgYzsPK3s1F7Mw6SkScnVHTBjBqahKR8Y5ghy56qDNpSQdaoOqhz5s3j8cff5wRI0awadMmbr75Zj755BOWL1/O6tWrSU5OprKyEqvVyvLly9myZQuPPPLIaWPbunUrWVlZOBwODh06xGWXXUZ2dnbgTv4MdSahbwZGKKXSgCPAtcD3TtjnDYyW+XNKqTiMLpjA/n0mzpjJpJgzKoE5oxLIL6/n5c2HeWVzAW/UlpOa6OCqxBjsFV42v3OQzW8fZMDwSEZNTWL4eQnYQi3BDl8IIDD10Gtra9mwYQOLFi1q2dbU1ATAjBkzWLx4MVdffTVXXnnlGcW2cOFCHI6e0xDqMKFrrT1KqaXAaoz+8X9orb9WSi0HtmitV/lfm6+U2gl4gdu01mVdGbg4M4NiQrntktH84qKRfLSziBWbDvPnfUcwmxSXnhfHhXYnDftqWLtiD5+9so+0jDhGTUticHoMJrlpSQRRIOqh+3w+oqKi2m09P/7442zatIl33nmHzMzMM2phB7v++Yk6daeo1vpd4N0Ttt3ValkDv/RPogezmE1cOn4Al44fwKHSOl7afJiVWwp4u66EQdEOrr04kbR6EwXZpezfWowj3MLIKUmMmpZEXEqYVH0UQXW29dAjIiJIS0tj5cqVLFq0CK01O3bsICMjg9zcXKZOncrUqVN56623yM/P77CWek8lTa9+bEickzsuHcPnd8zl4esmkhzt4MHNh7hl10G+zAwl7fJUBgyLImdtAf+6dzOv3PMF2z44TF1VU7BDF/3QudZDX7FiBc888wwZGRmMHTuWN998E4DbbruN8ePHM27cOGbPnk1GRgZz5sxh586dp70o2hNJcS7Rxv7iWl764jCvbi2gqsHN8IQwvpeZzDivhcNfllB0sBqlYFB6DKOmJZGWEY/FKkMg+yIpzhV8UpxLnJPhCWH8/rJ0brtkFG9tP8qLmw6z/IM9OCxmvpM5kCsXjMV3qJY9m47x4VWUCUAAACAASURBVDM7sdjNDJ+UwKhpSQwYHiV3pQoRRJLQRbvsFjOLJg9i0eRB5BRU8eLGPN7IPsLLm/PJGBTF968YxGynkwNbitm/tZhdGwqx2MwkDIkgKS2CpKGRJKZF4Ai3BvtURB/X1fXQn332Wf7617+22TZjxgweffTRgBw/kKTLRXRaVYOb178s4MWNeeSW1BHpsLDovBSumZQCRxso3FfJsYPVlBbUon3G/6uIeAdJaREkpkWSNDSC2JQwKfXbS+zatYvRo0fLhfAg0Vqze/fuM+pykYQuzpjWmo0HynlxYx6rvz6Gx6eZOTyO72QOZM7oBCKtIZTk1XDsYBVFB6s5dqCK+iqjlozZYiIhNdxI8GkRDBwRJa34HurgwYOEh4cTGxsrSb2baa0pKyujpqaGtLS0Nq9JQhddpri6kVc25/Py5nyOVDagFGSkRDFvdALzxiQyZkA4ALUVTRw7YCT4ooNVFB+uwecx/u/FDHSSPDKa5FFRRoIPkwTfE7jdbgoKCmhsbAx2KP2S3W4nJSUFi6XtTX6S0EWX01rz9dFqPtldzMe7i9mebzzUekCknbmjE5g3JoHpw+Kw+4uCed0+SvJrOLK3giN7KyncX4nH5QMgNjmM5JFRJI+KZuCIKOxOuWtViGaS0EW3K65pZO3uEj7eXcRn+0qpd3mxW0zMGBbHvDGJzB2dQFKkvWV/r8dHcV4NR/ZUcGRvBcdyq/C4faAgLiWM5BFGC37AsCjsYZLgRf8lCV0EVZPHy6YD5Xyyu5iPdhVRUNEAQPqACOaMjufCUQlMHBRFSKuLpV63j6JD1f4WfAXHcqvxeowWvDPKRmxyGLHJTmKTw4hLCSMqMRRziFxsFX2fJHTRY2it2Vdcy8e7ivlkdxFfHq7E69NE2EOYNTKeC0fGc8GoeBLC7W3e53F7jf73Q9WUH6mj9EgtFYV1+LzG/1+TWRGdFOpP9GHEpoQROzAMZ5RVLuiJPkUSuuixqhrcZO0rZe2eYtbuLaGkxigrMHZgBHNGJXDhqHgyT2i9N/N6fVQeq6fsaC1lBXWUHaml7EgttRXHSxPYnCEkDolgwLAoBo6IJCE1ghC5s1X0YpLQRa/QfGF13d4S1u4pbrf1PmtEfJu+9/Y01rkpP1pLaUEdZQU1HDtYTfnROgBMIYqEwREMGB7JwOFRJA2LlIuuoleRhC56pVO13hMjbGSkRJExKIqMlCjGp0QS6Th9Um6sc1OYW0XhfmNETXFeTUt3TcxAJwOGRzFweCQDhkcRHnP6LwwhgkkSuuj1mlvvmw+Vsz2/ku0FVRwsrWt5fWi800jyKZFkDIpizICIliGS7fG4vBQdqqZwvz/JH6jC3egFICzaRmSCg/AYO+GxDiJi7f5lO2HRNqkPL4JKErrok6rq3ew4UtmS4LPzK1ta8RazYnRSBBNSIkmLc5IQYScx3EZihJ2ECBuh1rZljHw+TVlBLYW5lRQdrKa6tJGasgbqql1tnqCrTApnlJWIWAfhrRJ9ZLyDmIFOuSlKdDlJ6KJf0FpzrLqR7flVbC+oZEdBJTvyq6hp8py0b7gthIQII8EnRthJCLcZST/CRlKEndRYJ3FhVnweTU1FIzVljdSUG/PqsgZjvayRusomWv8KOcItxAx0EjMwjJgBTmN5gFP66UXASPlc0S8opRgQ6WBApIMF45IAI8lXN3goqmmkuLqJourGluXimkaKqpvYfKic4pomXP5x7s3CbCGkxTnbToNjGB3nbOmz93p91FU0UVlUT3lhHeVH6ygvrGP3hkLcTd6WYzkjrccTvT/JR8Q5sIdZpOSwCJhe10L3etyYTGaUSfoxReBoralqcFNU3cTRqgbySus4WFrHwbJ6DpbWUlDR0KYlHuu0tiT5IXFOhsU7GZ4QRmqsE4vZhPYZLfvyo8eTfPnROioK64w7YP2USeEItxAaYSU0woYz0mosRxrrof51Z6QNi02GW4o+1uWy/cN3Wf+vFaSOzzSmCRMJj43rggiFOK7R7SW/vN5I8v7pQGkdh0rrKK45Pu49xKRIjQ1leELY8Sk+nGEJTkKtIfh8mpqyBsqP1lFb0UR9tYu6KmNeX+WivqqJ+hp3S/nh1iw2M84om5Hgo4zk74wykr4z0oYz0li22uUP776sT3W5xCQPIi1jEnk52exev65lW+qETFLHT2TQ2PFY7Y4gRyn6GrvFzIjEcEYkhp/0Wm2ThwMltewvPj7tK67lo13FeFsl5uQoB8MSwhgebyT6ISmhJEdHMyDSgbVV2QLt0zTWuamrclFffTzZ11U1tcyLDlVTX9nUprXfzGI3Gwk+ykpYtJ3opFCik5q7eewySqcP63Ut9GZaa0rz88jb/iV5OdkU7Poaj6sJkzmEgSNHkzphIqkTMkkcOhyTSf5UFd3P5fGRV1Z3PNH7k35uSS2NrbtdFCSG20mJdpAc7SA5ykFKdCjJ0Q5jW5Sj3SGYWmtcDR7qmpN9ZZOx7J/XVzVR7b9w28wUoohKCG1J8tED/PPEULmDtpfoU10up+JxuTiyZyd5Odnkbd9G8aFcAOzOMAaPy2DY+dMYMXU6FqstYJ8pxNnw+TRHKhvIL6+noLKBgooGjlQ0cKSynoKKBo5VNeI5ocslLsxGcrSDhHAb8eG2lnl8mDE6Jz7cRlyYFVvIyUm5qcFDxbE6KgrrjfmxeioK66gubXVdQEFErJ3oJCfhsXZCI6w4wv39+f7JEWGVB4L3AP0ioZ+ovrqKwznZHNqxjbwd26gtL8MW6mTMrAsZP/cSEoYM7bLPFuJceH2aoupGI9FX1vuTvZH4S2qaKKlpoqzO1e57Ix2W48k+3EZ0qJVQq5lQqxm7xYzDv+ywmLEqhanOi7fChbuiicbyJmpLGmisctFUf/JQTzD68R0RVkLD/Rduw41EHxZlwxllIyzamNtCQ6QoWhfplwm9Ne3zUbDrK3I++YC9m9bjdbtJHDqc8XPnM3rGBdhCnd0ShxCB4vb6KK9zUVzdREltIyU1Tf5lI+EX+xN/RZ2LBrf3pBb/6VjMikGRdoaGhzIo1Eqi1UKMOYQwFHYPeOo91Ne4qK9201DtorHOfdIxQiymNgm+9XJYlB1nlPEXgJQ8PnP9PqG31lhby66sNeR8vJqSw4cIsdoY9Y2ZjJs7n+RR6dKqEH2S2+ujwe2lweWf3F7qXV4a/dvq3V4aXV7qXR6Kapo4XF5Pfnk9h8vrqaxvm7BjnFYGxYQyOCaUwTEOUiIcRJtMhPoUNpfG1OjFVeOmtrKJusomaiuaqKtqannkYGu20BAc4VZj6Ga4tWXZEX68y6d5XVr9Bkno7dBaU3RgPzmfrGb3+nW4GhqIGZjC+LnzSZ89l9DIqKDFJkRPUtXgbknuLVOZMT9S2dBmJE8zu8VErNNGbJiVGKeVmFAL8VYLUZgJ02Bzg27w4m3w4K334PEve+o9eBu87URhPGA8PNZOZJy/vo5/HhFnlGHoLwlfEnoH3I2N7NmYRc4nH3B0z05M5hCGTZ5C0rCRRMTFEx6XQHhsHGHRMZjMclFIiGYer4/CqkbK6lyU1Rp9++UnLfvndU1tRvecitIQqiFUK0J9ilCtcPoU4VoR6VPEYCLCq7CccKgQm5nwWDtR8Q4iYh2ExdgwmRXaZ1yI1j6N1sbc5zOGhzZv8/lAezUms8IZbSM82k5YjI2waDuOcEuP+qKQhH4GygryyVnzAbuz1lJXWdHmNWUyERYTayT52HjC4+KJaJ7HxRMRn4gtNDRIkQvR89W7PJTVuqhudKM1+LTG559rrf3bmrfrln28Pk1lvZuCinr/xeIGikrrqatoItSlifSZiPQZCT/an/BDOpHalAlMJhPKZNy16/PolkcdNjNbTIRFG8k9PKZ5fjzhOyOtWB3d99eBJPSz5Gqop6aslOrSEmpKS4x5mX+5rISa0lJ83lajAZQiaehwUidMYkjGRAaMGI05pNfduyVEr+HzaUprm8hvGQlkjAoqKK+npLyBqnoP1Y1umnw+NOCDljn+/KsURNgtRDosRNjNRJjMhPsU4V6F0wsON1hdPkIafZgafah2uoSUCayhFhxhxrUAu9OCPczSMne0WrY7LTijzr6UgyT0LqJ9Puqrq6guLaamtITS/MPk5WRTuG832ufDYncweNwEUidMZMiEiUQlDexRf7oJ0R9orWlwe6lqcBtTvfv4coOb6lbLNY0e4+Kx/2Jxo//icYPbWHZ7NSYNYf4uoAif0R3k0ODQCodWOFE4MeHQYPGCqZ0UO+E7acy6NO2szuecb/1XSi0A/gqYgae11vedYr+rgJXA+Vrr3p2tO0GZTDijonFGRTNg+ChGToPpi75HY10t+V/vIG/HNg5t/5LcLZsAiIhPZMiEiaRmTGTwuAzszrAgn4EQfZ9SilBrCKHWEAZEnltZELfXZ4wMcntpdPmod3uoaTS6kSrqjWsF5XUuSv3z8tomamrdNNS6UC5fS9IPM3mYFaDza63DFrpSygzsBS4GCoDNwHVa650n7BcOvANYgaUdJfS+0ELvrMpjhRzyJ/f8r7fjamhAKRNJI0YycMRoohIHEJmYRGRCIhHxiYRYpHa2EH1Ng8tLeb2LijoXif67e8/GubbQpwD7tdYH/Ad7GfgOsPOE/f4APAD8+qyi7MOikgaQmTSAzPnfxOvxULh/D3k7tpG3fRvbP3gXj7vVXX9KERYTS1RCEpEJSUQmJhrLicZ6aGSUdNsI0Qs5rGaSrUZtnq7SmYSeDOS3Wi8AprbeQSk1ERiktX5bKSUJ/TTMISGkjB5LyuixzLj6B2ifj7qqSqqKjlFVfIxK/7yq+Bh5O76ktqK8zftDbDbiBqWSmDaMhLRhJKYNJ3ZQqrTqhRCdSujtNQdb+mmUUibgL8DiDg+k1BJgCcDgwYM7F2Efp0wmwqJjCIuOIXl0+kmvu11NVBcXtyT7yqKjlOYdYlfWOrZ/+B4AJrOZ2DZJfhjxg9Ow2OXp9UL0J53pQ/8GsExrfYl//Q4ArfX/+tcjgVyg1v+WJKAcWHi6fvT+1IfeFbTPR1VxEUUHcyk+uJ+ig7kUHcylsaYaAKVMxCSnkJA2jITUNGJSBhEzIIWIhAQpJyxEL3ZOwxaVUiEYF0XnAUcwLop+T2v99Sn2Xwv8Wi6Kdj+tNTVlpRT7k3vxwf0UH8xt021jtliIShxAzMAUYpJTiB6QTMzAFKIHJsuoGyF6gXO6KKq19iillgKrMYYt/kNr/bVSajmwRWu9KrDhirOllDLuWI2LZ/j501q211dXUX60gIqjRyg/WkD50QJK8/PYv2Uj2nf8rrjQyCgj0Q9MISI+AXtYOI7wcOxhxydHWDghNptcmBWiB5Ibi/oxr8dNZdGxlkRfUXiE8iMFlBceaem6aY/ZYjESvDMMR3gE9rAw7GHhmExmtPb5b+HWoDXad3y9Zbt/m1IKZ0xMS/kEo5xCHM6oaOkWEuIU+tQzRUXgmEMsxCYPIjZ50EmvuV1NNNbW0Fhba8xramiorfFvM6aGmhoa62qoLDpGY+4+tM9ntNyVQikTyqRQrZbBv24yoZTC6/VycPuXuBsb2ny2yWwmLCaW8Ni4E2rmxGELdeJxufxT08nLbmPZ3dS87iYsOoa4QanEDUolNmUQVofU2xF9kyR00S6L1YYlxkZ4TFyXfo7Wmqb6OmrKSqnx18oxauYY64X7drN34/q2NXNOQ5lMhFhthFitxmSxcGDrF3hcx5+rGRGfSNygwceT/KBUYpIHydBP0etJQhdBpZTC7gzD7gwjfvCQdvdpHqtfU1aCq77heLJunbj9y+0VQ2seEVSan9cyleXncWj7l/i8RqElZTIRnTSQuEGpxCSnYHWEGsezWLFYrZhbviCsLdtbr4dYbVjtdpRJnsAjgkf60EW/5fW4qSg8aiT4gsOUHs6jrCCPimOFcJa/FxabHavDgcVux2oPxepw+NeNudVux+Lfbg8LIzQiEkd4JI6ICEIjIrE6QuWCszgt6UMXoh3mEEtLt0trPp+3Vd+8C6/b1ap/3oXX5cLtNubN/fTupkbcjQ24Ghpw+efuRmO5tqL8+GsNDbibGk8TUwiO8AgcEZHGFG4kekdEBI6wCEwhZuOahP9aBNByTeL4NtVy/cJiteGMiSUsJha7MyxgXxauxgajW6yslIbqKuOvm9Q06bYKMknoQpzAZDJjtTuw2rum5ob2+XA1NvovLFfTUF1FfXWVMa+ppqG6moYaY1t1SREN1dU01ded8+eGWG3GXcn+BB8WE0tYdCzhscfXnVEx+DweaspLqSktNeZlJdSWlRnPAig35k11J8djMocQn5pG0rDhJA4dQdKwEcSmDJanfHUj6XIRohfwetw01ta2GgbqO/5ItZZl3/GhohhfHO7GRmoryqktL6OmvJTa8jJjqjDmXre7w88GcERE+kcdxR0ffRQTS3hsPPbwcMqP5HPswH6KcvdSdCC35QsoxGojYchQEocNJ2nYSBKHDidmQPJprzX4fF68LrfxF5Hb1bKstfY/XciY2i6b291uDgnpc0Ng5QEXQoiTaK1prK1pSfLNCd8cYvHfFxBHeEwcYTGxhFitnT+uz0dlUSHHcvdRdGCfMT+Yi6fJGGlkdTiIShqIz+v1d2e5j3dnuV0tF6oDxWQ2GxexLRbMFgshFitm/3Lr9RCLBYvNjtNfW8kZHUNYVAzOGGMeqNpIzV/CZ/tFIwldCBFUPp+X8gJ/K/7APqqKizCHWPwjhiz+0UIWf+L1b7Na2yRfpUxon9f/oGcfPp/P+GukZdnbZrvP68Xr8eD1uI0vDren5XqIsc2Nx2281rzsamigrrK83b9crI5Qf5dVDM4of8KPjsUcEoKrsfmayfFrKe6mxlbXUxpbrqm4Ghu5+Me3MGHegrP6WcpFUSFEUJlMZuIGDyFu8BDGXXhRsMM5La01jXW11FWUU1tRfsK8jLqKCo7s2UVdRRleT9v7I9qOZjKuwzijorHa/SOd7MYoqITUoV0SuyR0IYRoRSmFw1+36MQRUK01d1n5vF6sdgchVmvQ70OQhC6EEGdBKYUjPCLYYbQht7UJIUQf0eta6Pd/cT+7y3cHOwwhhDhro2NG85spvwn4caWFLoQQfUSva6F3xbeaEEL0BdJCF0KIPkISuhBC9BG9rstFiF7H6wFPA7gbweOf3A2gFJitYAox5mYrmC3+yQomC7Q3rtnTBE010FTtn/unxuqTt4XYIHUGDJkB9sjuP/eeSmvwusBVB+56cNWDu874d/F5/JP3+LLX3Xa99WS2giMKHNFg988dUWCLbP/frwtJQhfdz+uGrc/Bzjdh+DzI/D6EJQQ7KiPxNpRDfbnxi+6q8c/rwFV7iuXm9fp2knajsc3XuacttUuZjyd7k8n4PK+r4/eZQsAWYSSrzx8BZYKBEyHtAkibDYOngaVrqkmeMZ8PfG5/0vQnzpZlj/Hv4nMf/yJ0Nxjn5W7wJ+QTtrnrT3i9VcJ2+V9z1YEObM2YkynjS7Q5wbdO+OOvgtTpgf9EqeUiuo3WsOdd+PAuKNsPUalQmWckn5ELYNKPjAQfyOp4tcVQlQ91ZVBfBvWlxryu1EjcrdcbKzt3zBAHWJ3+Kcw/DzW2W+wQ4p8sjrbzELv/df9+YCRnrz+ZNS/73KfY7jE+yxZuJGt7hH85/Pi25uUQu/EXgKcJCjbDgXVw8FM4suV4q3LQ1OMJPnmS8ZfBmfC6obHKP1VCQ2Wr9ROnE15rqj2exAlQDjKFgMVp/LybJ6sTLKGt5qHGPtZQ/3pYq2Wn8XMzW4xjmSzG/0VTSKupnXWvCxoqjPNvqPD/LE5cP+G1i/8AE79/VqfZp4pzHfvjH2naJePQex1XLZQfNH6ZLaEQMwQcMUYrqvaYkXi9bqOLICzRmEJsZ/45Pk+rBFNltMZOpJTxy9r8i9vczdF6m8lstI5PmhsPkOi1tNfommlOMi5/XXOT2fiCsEcZPwNvO10LJ3U5dNDCVerUiVCZjdeV8fDw48umdrY1L5v8k9mYm8zHt5nM9KZ/F9uY0STdeedZvVeKc4ng8TRB5SGoLTESRewwCE+i5ZfP4oDoNIgaAg1lUFMElYeNyREN4YngiPX/YrejdYJqbvmB8Qtui4DwBOMLxGQBc3Orqx//t1dmfxdANETj/wL0f/k1VkL9wbb7n5iIQ+zHl82narW2+kIU3arX/c8+22810c0aqyDrL/D5Y0Yy/sYtMOMXRiuwI5WHYdsK2PYiVK+H0DjIvA4m/hCiU6Fgi9F9cHCdsexzG10IKVOM7oO02ZB8HoR0voa38KsuNP6qsUcZ/b/mXpci+rVe1+UierjmC55r/9fom55wLcz9HUQNOvNj+byQ+wl8+Tzsec9oTYbYjYtjygQDMo3kPfQCGDTN6AsVoo+TLpf+zFUHR748frW/eaSAp9E/EqCx/W0ms//KfAyExrQ/d0QfbwVrbSTdD++Csn2QOhMuuccYWXG2TGYYcbEx1RTB9peMvvYhM4yheI6owPyMhOgjJKF3F62NC4N1JcaIiroSY7JHwpDZ4IwN3Gc11cK+1fD1G7DvQ2Po3Kko0/FRF5bQ4yMxfF5jFEhD+emHyVnDITTa6Jsuz4XYEXDtSzDq0lP3e5+N8ESY+YvAHU+IPkgSeqDUFhv9urVFx5N1bUnbBH7KxKpgwAQYOgeGXgiDv3F8WFtntZfEnQnG0KiRC4xWdUir4Vwh/gRutpw+8WpttNqbk3ubecXx9cYqmPYzOG/xmQ9/E0IEhCT0c+H1wP6PYNsLsPf94zeQmCzgjAdnnDGPH3V82RlvJFpnnDHVHIPcNXBgDXz+KKx/yEi2g6cZCX7YHEgc3/4dZ021xufubE7ijcZwv4k/gLGXG18M5zqmW6njY67Pph9cCNFt5KLo2SjLNUZgbH8JagqNJJ1xLYy7yhiFYY86u+6GplrI22Ak99w1ULLL2B4aa9wAMvRC4+6ywu3w9b+NLxNPI4QlQfpCSL/c+CII5I05QogeRS6KBoKrHnatgi9fgLwso+95+MXwzQeNLo1AdDPYwmDkfGMCo/V+YK0x5a6Br18/vm9YknFn5djLjREeQX6WoRAi+CShn47WcHSb0aWS8xo0VRk3wcz9PWR+DyIGdu3nhycZLf+Ma41YSnbD4Y0QP9q4bVuSuBCilU4ldKXUAuCvgBl4Wmt93wmv/xK4CfAAJcANWuu8AMfafbQ2xlJvfgaKcow+7fTvwMTrjeFywUikSkHCGGMSQoh2dJjQlVJm4FHgYqAA2KyUWqW13tlqt23AZK11vVLqZ8ADwDVdEXC32Ph3WH0HDMiAb/3Z6BuXMc9CiB6uMy30KcB+rfUBAKXUy8B3gJaErrVe02r/jcAPAhlkt8rbAB/8DkZfBte8GNix1EII0YU603eQDOS3Wi/wbzuVG4H32ntBKbVEKbVFKbWlpKSk81F2l5pjsHIxxKTB5Y9JMhdC9CqdSejtZbV2xzoqpX4ATAYebO91rfWTWuvJWuvJ8fHxnY+yO3jd8K8fGU95ueZFebqLEKLX6UyXSwHQ+o6SFODoiTsppS4CfgtcoLVuCkx43eiD30P+RvjuM3LhUQjRK3Wmhb4ZGKGUSlNKWYFrgVWtd1BKTQSeABZqrYsDH2YXy3kVNv0dpt1sPBpKCCF6oQ4TutbaAywFVgO7gH9prb9WSi1XSi307/YgEAasVEplK6VWneJwPU/RTlh1q3Gb/MXLgx2NEEKctU6NQ9davwu8e8K2u1otXxTguLpHYxW88gPjGYyLnpOiUkKIXq3/3inq88G/f2Y8pPhHb/sfiyaEEL1X/03o6/8Ce96BBfdB6jeCHY0QQpyz/lkMJPcT+OQeGPddmPrTYEcjhBAB0f8SemU+vHojxI2Cbz8sNw8JIfqM/pXQ3Y3wr+uNm4iuedEoVyuEEH1E/+pDf/83Rjnca16EuOHBjkYIIQKq/7TQv3zBKIk7879gzLeDHY0QQgRc/0joR7PhnV8Zj3Gb87tgRyOEEF2i7yd0dyO8eoPxQOar/gHm/tXLJIToP/p+dsv6f1CeC9e/YSR1IYToo/p2C710H2T9BcYvgmFzgh2NEEJ0qb6b0LWGt/8LLA645I/BjkYIIbpc3+1y2f4SHPoMLnsIwhKCHY0QQnS5vtlCryuD1b+FQVNh0o+CHY0QQnSLvpnQP7wLmqqN1rmpb56iEEKcqO9lu0PrIftF+MZSSEwPdjRCCNFt+lZC9zTB27+AqMFwwW+CHY0QQnSrvnVRdP3DULoXvv8qWEODHY0QQnSrvtNCL8uFTx+E9MthxMXBjkYIIbpd30joWsM7v4QQm/EEIiGE6If6RpdLzqtwYC18808QMSDY0QghRFD0/hZ6QwWsvgOSz4PJNwQ7GiGECJre30L/aBnUl8MPXgeTOdjRCCFE0PTuFvrhTcZDK6b9DAZMCHY0QggRVL03oXvdxpjziBS48I5gRyOEEEHXe7tcPn8EinfCtS/Jw56FEILe2kKvOARr74fRl8HobwY7GiGE6BF6X0LXGt75tXEB9NL7gx2NEEL0GL0voe98A/Z/CHN+C5EpwY5GCCF6jN6X0G3hMOpbMGVJsCMRQogepfddFB1+kTEJIYRoo/e10IUQQrSrUwldKbVAKbVHKbVfKXV7O6/blFKv+F/fpJQaEuhAhRBCnF6HCV0pZQYeBS4F0oHrlFInPgroRqBCaz0c+Asgw0+EEKKbdaaFPgXYr7U+oLV2AS8D3zlhn+8Az/uXXwXmKaVU4MIUQgjRkc4k9GQgv9V6gX9bu/torT1AFRB74oGUUkuUUluUUltKSkrOLmIhhBDt6kxCb6+lrc9iH7TWT2qtJ2utJ8fHx3cmPiGEEJ3UmYReAAxqtZ4CHD3VPkqpECASKA9EgEIIITqnMwl9MzBCKZWmlLIC1wKrTthnFfAj//JVwCda65Na6EIIrVOJOQAABKtJREFUIbqO6kzeVUp9E3gIMAP/0Frfq5RaDmzRWq9SStmBF4CJGC3za7XWBzo4ZgmQ18FHxwGlHZ9Gn9Sfzx369/n353OH/n3+nTn3VK11u33WnUrowaKU2qK1nhzsOIKhP5879O/z78/nDv37/M/13OVOUSGE6CMkoQshRB/R0xP6k8EOIIj687lD/z7//nzu0L/P/5zOvUf3oQshhOi8nt5CF0II0UmS0IUQoo/okQm9o3K9fY1S6h9KqWKl1FettsUopT5USu3zz6ODGWNXUUoNUkqtUUrtUkp9rZT6T//2/nL+dqXUF0qp7f7zv9u/Pc1finqfvzS1NdixdhWllFkptU0p9bZ/vV+cu1LqkFIqRymVrZTa4t92Tv/ve1xC72S53r7mOWDBCdtuBz7WWo8APvav90Ue4Fda6zHANOAW/793fzn/JmCu1joDyAQWKKWmYZSg/ov//CswSlT3Vf8J7Gq13p/OfY7WOrPV2PNz+n/f4xI6nSvX26dorT/l5No3rUsSPw//v737eZWyiuM4/v6QLqIESVIkkUu0sI3Uxo0uRKSNF3GhmwzqT2ghQW2CVq2ioqVBGxOE0lwqqCAKLtRALTf9wIXiXUm1adF8WpxzaYgJrw4z83TO5wWXmTNzuPd84TzfezgPz/dwaK6DmhPbD2zfqO9/p1zYL9FP/Lb9R22urz8G9lFKUUPD8UvaBhwAjte26CT2/zDVvB9iQl9Lud4ebLH9AErSAzYveDwzV0+6eh24Rkfx1y2H74EV4DzwE/ColqKGtq+BT4H3gFFtb6Kf2A2ck3Rd0uqp91PN+yEeEr2mUrzRFknPA98A79r+rafzUWz/BbwmaSNwGnh1Urf5jmr2JC0DK7avS9q7+vGErs3FXu22fV/SZuC8pLvT/sIhrtDXUq63Bw8lbQWorysLHs/MSFpPSeYnbH9bP+4m/lW2HwGXKPcSNtZS1NDuNbAbOCjpV8rW6j7Kir2H2LF9v76uUP6R72LKeT/EhL6Wcr09GC9J/Dbw3QLHMjN1z/RL4Efbn4x91Uv8L9aVOZKeBfZT7iNcpJSihkbjt/2+7W22lyjX+QXbR+kgdknPSdqw+h54A7jNlPN+kE+KTirXu+AhzZSkk8BeSunMh8CHwBngFLAduAccsd3coSGS9gCXgVv8s4/6AWUfvYf4d1Jufj1DWWCdsv2RpJcpq9YXgJvAW7b/XNxIZ6tuuRyzvdxD7DXG07W5Dvi6liXfxBTzfpAJPSIintwQt1wiIuIpJKFHRDQiCT0iohFJ6BERjUhCj4hoRBJ6xBhJBx9X4VPS0nhlzIihGOKj/xELY/ssfT7IFg3ICj26UVfWdyUdl3Rb0glJ+yVdqfWnd0l6R9IXtf9Xkj6XdFXSz5IOP+5vRCxSEnr05hXgM2AnsAN4E9gDHKM8ofpvW+v3y8DHcxpjxFNJQo/e/GL7lu0RcIdymIAppQeWJvQ/Y3tk+wdgyxzHGfHEktCjN+M1QUZj7RGT7ymN9++npm/8LyWhR0Q0Igk9IqIRqbYYEdGIrNAjIhqRhB4R0Ygk9IiIRiShR0Q0Igk9IqIRSegREY1IQo+IaMTfnMe9CCHUUy0AAAAASUVORK5CYII=\n", "text/plain": [""]}, "metadata": {"needs_background": "light"}, "output_type": "display_data"}], "source": ["df.plot(x=\"minl\", y=[\"r2_train_dt\", \"r2_test_dt\",\n", " \"r2_train_reg\", \"r2_test_reg\",\n", " \"r2_train_rf\", \"r2_test_rf\"]);"]}, {"cell_type": "markdown", "metadata": {}, "source": ["A l'inverse de l'arbre de r\u00e9gression, la for\u00eat al\u00e9atoire est meilleure lorsque ce param\u00e8tre est petit. Une for\u00eat est une moyenne de mod\u00e8le, chacun appris sur un sous-\u00e9chantillon du jeu de donn\u00e9es initiale. M\u00eame si un arbre apprend par coeur, il est peu probable que son voisin ait appris le m\u00eame sous-\u00e9chantillon. En faisant la moyenne, on fait un compromis."]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Validation crois\u00e9e\n", "\n", "Il reste \u00e0 v\u00e9rifier que le mod\u00e8le est robuste. C'est l'objet de la validation crois\u00e9e qui d\u00e9coupe le jeu de donn\u00e9es en 5 parties, apprend sur 4, teste une 1 puis recommence 5 fois en faisant varier la partie qui sert \u00e0 tester."]}, {"cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [{"name": "stderr", "output_type": "stream", "text": ["[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n", "[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 5.4s finished\n"]}, {"data": {"text/plain": ["array([0.05037733, 0.24594631, 0.25811598, 0.348578 , 0.2462281 ])"]}, "execution_count": 25, "metadata": {}, "output_type": "execute_result"}], "source": ["from sklearn.model_selection import cross_val_score\n", "cross_val_score(\n", " RandomForestRegressor(n_estimators=25), X, y, cv=5,\n", " verbose=1)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Ce r\u00e9sultat doit vous interrompre car les performances sont loin d'\u00eatre stables. Deux options : soit le mod\u00e8le n'est pas robuste, soit la m\u00e9thodologie est fausse quelque part. Comme le probl\u00e8me est assez simple, il est probable que ce soit la seconde option : la jeu de donn\u00e9es est tri\u00e9e. Les vins rouges d'abord, les blancs ensuite. Il est possible que la validation crois\u00e9e estime un mod\u00e8le sur des vins rouges et l'appliquent \u00e0 des vins blancs. Cela ne marche pas visiblement. Cela veut dire aussi que les vins blancs et rouges sont tr\u00e8s diff\u00e9rents et que la couleur est probablement une information redondante avec les autres. M\u00e9langeons les donn\u00e9es au hasard."]}, {"cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": ["from sklearn.utils import shuffle\n", "X2, y2 = shuffle(X, y)"]}, {"cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [{"name": "stderr", "output_type": "stream", "text": ["[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n", "[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 5.6s finished\n"]}, {"data": {"text/plain": ["array([0.47975777, 0.50951094, 0.49514404, 0.51110336, 0.51584857])"]}, "execution_count": 27, "metadata": {}, "output_type": "execute_result"}], "source": ["cross_val_score(\n", " RandomForestRegressor(n_estimators=25), X2, y2, cv=5,\n", " verbose=1)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Beaucoup mieux. On peut faire comme \u00e7a aussi."]}, {"cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [{"name": "stderr", "output_type": "stream", "text": ["[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n", "[Parallel(n_jobs=1)]: Done 5 out of 5 | elapsed: 6.6s finished\n"]}, {"data": {"text/plain": ["array([0.53754932, 0.54227221, 0.5442236 , 0.57726314, 0.53393994])"]}, "execution_count": 28, "metadata": {}, "output_type": "execute_result"}], "source": ["from sklearn.model_selection import ShuffleSplit\n", "cross_val_score(\n", " RandomForestRegressor(n_estimators=25), X, y, cv=ShuffleSplit(5),\n", " verbose=1)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Pipeline\n", "\n", "On peut caler un mod\u00e8le apr\u00e8s une ACP mais il faut bien se souvenir de toutes les \u00e9tapes interm\u00e9diaires avant de pr\u00e9dire avec le mod\u00e8le final."]}, {"cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [{"data": {"text/plain": ["PCA(copy=True, iterated_power='auto', n_components=6, random_state=None,\n", " svd_solver='auto', tol=0.0, whiten=False)"]}, "execution_count": 29, "metadata": {}, "output_type": "execute_result"}], "source": ["from sklearn.decomposition import PCA\n", "pca = PCA(6)\n", "pca.fit(X_train, y_train)"]}, {"cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [{"data": {"text/plain": ["RandomForestRegressor(bootstrap=True, ccp_alpha=0.0, criterion='mse',\n", " max_depth=None, max_features='auto', max_leaf_nodes=None,\n", " max_samples=None, min_impurity_decrease=0.0,\n", " min_impurity_split=None, min_samples_leaf=1,\n", " min_samples_split=2, min_weight_fraction_leaf=0.0,\n", " n_estimators=100, n_jobs=None, oob_score=False,\n", " random_state=None, verbose=0, warm_start=False)"]}, "execution_count": 30, "metadata": {}, "output_type": "execute_result"}], "source": ["rf = RandomForestRegressor(n_estimators=100)\n", "X_train_pca = pca.transform(X_train)\n", "rf.fit(X_train_pca, y_train)"]}, {"cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [{"data": {"text/plain": ["0.421429956568139"]}, "execution_count": 31, "metadata": {}, "output_type": "execute_result"}], "source": ["X_test_pca = pca.transform(X_test)\n", "pred = rf.predict(X_test_pca)\n", "r2_score(y_test, pred)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Ou alors on utilise le concept de *pipeline* qui permet d'assembler les pr\u00e9traitements et le mod\u00e8le pr\u00e9dictif sous la forme d'une s\u00e9quence de traitement qui devient le mod\u00e8le unique."]}, {"cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": ["from sklearn.pipeline import Pipeline\n", "pipe = Pipeline([\n", " ('acp', PCA(n_components=6)),\n", " ('rf', RandomForestRegressor(n_estimators=100))\n", "])\n", "pipe.fit(X_train, y_train);"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Grille de recherche\n", "\n", "De cette fa\u00e7on, on peut chercher simplement les meilleurs hyperparam\u00e8tres du mod\u00e8le."]}, {"cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["Fitting 3 folds for each of 12 candidates, totalling 36 fits\n"]}, {"name": "stderr", "output_type": "stream", "text": ["[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.\n", "[Parallel(n_jobs=1)]: Done 36 out of 36 | elapsed: 44.4s finished\n"]}, {"data": {"text/plain": ["GridSearchCV(cv=ShuffleSplit(n_splits=3, random_state=None, test_size=None, train_size=None),\n", " error_score=nan,\n", " estimator=Pipeline(memory=None,\n", " steps=[('acp',\n", " PCA(copy=True, iterated_power='auto',\n", " n_components=6, random_state=None,\n", " svd_solver='auto', tol=0.0,\n", " whiten=False)),\n", " ('rf',\n", " RandomForestRegressor(bootstrap=True,\n", " ccp_alpha=0.0,\n", " criterion='mse',\n", " max_depth=None,\n", " m...\n", " min_samples_leaf=1,\n", " min_samples_split=2,\n", " min_weight_fraction_leaf=0.0,\n", " n_estimators=100,\n", " n_jobs=None,\n", " oob_score=False,\n", " random_state=None,\n", " verbose=0,\n", " warm_start=False))],\n", " verbose=False),\n", " iid='deprecated', n_jobs=None,\n", " param_grid={'acp__n_components': [1, 4, 7, 10],\n", " 'rf__n_estimators': [10, 20, 50]},\n", " pre_dispatch='2*n_jobs', refit=True, return_train_score=False,\n", " scoring=None, verbose=1)"]}, "execution_count": 33, "metadata": {}, "output_type": "execute_result"}], "source": ["from sklearn.model_selection import GridSearchCV\n", "param_grid = {'acp__n_components': list(range(1, 11, 3)),\n", " 'rf__n_estimators': [10, 20, 50]}\n", "grid = GridSearchCV(pipe, param_grid=param_grid, verbose=1,\n", " cv=ShuffleSplit(3))\n", "grid.fit(X, y)"]}, {"cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [{"data": {"text/plain": ["{'acp__n_components': 10, 'rf__n_estimators': 50}"]}, "execution_count": 34, "metadata": {}, "output_type": "execute_result"}], "source": ["grid.best_params_"]}, {"cell_type": "code", "execution_count": 34, "metadata": {"scrolled": false}, "outputs": [{"data": {"text/plain": ["array([7.1 , 5.06, 7. , ..., 6.74, 4.12, 5.98])"]}, "execution_count": 35, "metadata": {}, "output_type": "execute_result"}], "source": ["grid.predict(X_test)"]}, {"cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [{"data": {"text/plain": ["0.9275290318700775"]}, "execution_count": 36, "metadata": {}, "output_type": "execute_result"}], "source": ["r2_score(y_test, grid.predict(X_test))"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Ce nombre para\u00eet beaucoup trop beau pour \u00eatre vrai. Cela signifie sans doute que les donn\u00e9es de test ont \u00e9t\u00e9 utilis\u00e9s pour effectuer la recherche."]}, {"cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [{"data": {"text/plain": ["0.49487646056265816"]}, "execution_count": 37, "metadata": {}, "output_type": "execute_result"}], "source": ["grid.best_score_"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Nettement plus plausible."]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Enregistrer, restaurer\n", "\n", "Le moyen le plus simple de conserver les mod\u00e8les en python est de les s\u00e9rialiser : on copie la m\u00e9moire sur disque puis on la restaure plus tard."]}, {"cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [], "source": ["import pickle\n", "\n", "with open('piperf.pickle', 'wb') as f:\n", " pickle.dump(grid, f)"]}, {"cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [{"data": {"text/plain": ["['piperf.pickle']"]}, "execution_count": 39, "metadata": {}, "output_type": "execute_result"}], "source": ["import glob\n", "glob.glob('*.pickle')"]}, {"cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [], "source": ["with open(\"piperf.pickle\", 'rb') as f:\n", " grid2 = pickle.load(f)"]}, {"cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [{"data": {"text/plain": ["array([7.1 , 5.06, 7. , ..., 6.74, 4.12, 5.98])"]}, "execution_count": 41, "metadata": {}, "output_type": "execute_result"}], "source": ["grid2.predict(X_test)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Pr\u00e9diction de la couleur\n", "\n", "Le fait que la premi\u00e8re validation crois\u00e9e \u00e9choue \u00e9tait un signe que la couleur \u00e9tait facilement pr\u00e9visible. V\u00e9rifions."]}, {"cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [], "source": ["Xc = df_data.drop(['quality', 'color'], axis=1)\n", "yc = df_data[\"color\"]"]}, {"cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [], "source": ["Xc_train, Xc_test, yc_train, yc_test = train_test_split(Xc, yc)"]}, {"cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [], "source": ["from sklearn.linear_model import LogisticRegression\n", "log = LogisticRegression(solver='lbfgs', max_iter=1500)\n", "log.fit(Xc_train, yc_train);"]}, {"cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [{"data": {"text/plain": ["0.04459922717947637"]}, "execution_count": 45, "metadata": {}, "output_type": "execute_result"}], "source": ["from sklearn.metrics import log_loss\n", "log_loss(yc_test, log.predict_proba(Xc_test))"]}, {"cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [{"data": {"text/plain": ["array([[ 391, 14],\n", " [ 9, 1211]], dtype=int64)"]}, "execution_count": 46, "metadata": {}, "output_type": "execute_result"}], "source": ["from sklearn.metrics import confusion_matrix\n", "confusion_matrix(yc_test, log.predict(Xc_test))"]}, {"cell_type": "markdown", "metadata": {}, "source": ["La matrice de confusion est plut\u00f4t explicite."]}, {"cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [], "source": []}, {"cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [], "source": []}], "metadata": {"kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.2"}}, "nbformat": 4, "nbformat_minor": 2}