{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# Rappels sur scikit-learn et le machine learning\n", "\n", "Quelques exercices simples sur *scikit-learn*. Le notebook est long pour ceux qui d\u00e9butent en machine learning et sans doute sans suspens pour ceux qui en ont d\u00e9j\u00e0 fait."]}, {"cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [{"data": {"text/html": ["
run previous cell, wait for 2 seconds
\n", ""], "text/plain": [""]}, "execution_count": 2, "metadata": {}, "output_type": "execute_result"}], "source": ["from jyquickhelper import add_notebook_menu\n", "add_notebook_menu()"]}, {"cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": ["%matplotlib inline"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Des donn\u00e9es synth\u00e9tiques\n", "\n", "On simule un jeu de donn\u00e9es al\u00e9atoires."]}, {"cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [{"data": {"text/plain": ["array([[0.25324685, 0.97811479],\n", " [0.32928095, 0.40816327],\n", " [0.44178633, 0.51600754],\n", " [0.76893618, 0.34170807],\n", " [0.00282938, 0.49371721]])"]}, "execution_count": 4, "metadata": {}, "output_type": "execute_result"}], "source": ["from numpy import random\n", "n = 1000\n", "X = random.rand(n, 2)\n", "X[:5]"]}, {"cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [{"data": {"text/plain": ["array([-0.18666718, 1.29326419, 1.64748543, 2.39341326, 0.06048883])"]}, "execution_count": 5, "metadata": {}, "output_type": "execute_result"}], "source": ["y = X[:, 0] * 3 - 2 * X[:, 1] ** 2 + random.rand(n)\n", "y[:5]"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Exercice 1 : diviser en base d'apprentissage et de test"]}, {"cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": []}, {"cell_type": "markdown", "metadata": {}, "source": ["## Exercice 2 : caler une r\u00e9gression lin\u00e9aire\n", "\n", "Et calculer le coefficient $R^2$."]}, {"cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": []}, {"cell_type": "markdown", "metadata": {}, "source": ["## Exercice 3 : am\u00e9liorer le mod\u00e8le en appliquant une transformation bien choisie"]}, {"cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": []}, {"cell_type": "markdown", "metadata": {}, "source": ["## Exercice 4 : caler une for\u00eat al\u00e9atoire"]}, {"cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": []}, {"cell_type": "markdown", "metadata": {}, "source": ["## Exercice 5 : un peu de math\n", "\n", "Comparer les deux mod\u00e8les sur les donn\u00e9es suivantes ? Que remarquez-vous ? Expliquez pourquoi ?"]}, {"cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": ["X_test2 = random.rand(n, 2) + 0.5\n", "y_test2 = X_test2[:, 0] * 3 - 2 * X_test2[:, 1] ** 2 + random.rand(n)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Exercice 6 : faire un graphe avec...\n", "\n", "Le nuage de points du premier et second jeu, les pr\u00e9dictions des deux mod\u00e8les, une l\u00e9gende, un titre... avec [pandas](https://pandas.pydata.org/) ou directement avec [matplotlib](https://matplotlib.org/) au choix."]}, {"cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": []}, {"cell_type": "markdown", "metadata": {}, "source": ["## Exercice 7 : illuster l'overfitting avec un arbre de d\u00e9cision\n", "\n", "Sur le premier jeu de donn\u00e9es."]}, {"cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": []}, {"cell_type": "markdown", "metadata": {}, "source": ["## Exercice 8 : augmenter le nombre de features et r\u00e9gulariser une r\u00e9gression logistique\n", "\n", "L'objectif est de regarder l'impact de la r\u00e9gularisation des coefficients d'une r\u00e9gression logistique lorsque le nombre de features augmentent."]}, {"cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": []}], "metadata": {"kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.0"}}, "nbformat": 4, "nbformat_minor": 2}