.. blogpost::
    :title: Plan des séances
    :keywords: plan
    :date: 2021-01-28
    :categories: session

    Voici le plan prévu pour les cinq séances
    du cours de machine learning pour l'économie et
    la finance.

    **Séance 1**

    * `Python <https://www.python.org/>`_,
      `Anaconda <https://www.anaconda.com/>`_,
      :epkg:`pandas`, :epkg:`numpy`, :epkg:`jupyter`,
      :epkg:`matplotlib`
    * Ressource `www.xavierdupre.fr <www.xavierdupre.fr>`_
    * `Source de données <http://www.xavierdupre.fr/app/ensae_teaching_cs/helpsphinx/ressources.html#source-de-donnees>`_
    * :epkg:`scikit-learn`
    * :ref:`l-regclass`, multi-classification
    * `métriques <https://scikit-learn.org/stable/modules/model_evaluation.html>`_
    * `Imbalanced problem <https://github.com/scikit-learn-contrib/imbalanced-learn>`_
    * Courbe ROC, AUC, voir :ref:`winescolorrocrst`,
      `Courbe ROC <http://www.xavierdupre.fr/app/mlstatpy/helpsphinx/c_metric/roc.html>`_,
      `graphe erreur ROC <http://www.xavierdupre.fr/app/ensae_teaching_cs/helpsphinx/notebooks/ml_c_machine_learning_problems.html#graphe-erreur-roc>`_
    * `Ridge, Lasso <http://www.xavierdupre.fr/app/papierstat/helpsphinx/notebooks/2020-02-07_sklapi.html>`_
    * Decision Tree,
      `Arbres de décision / Random Forest
      <http://www.xavierdupre.fr/app/ensae_teaching_cs/helpsphinx/notebooks/td2a_cenonce_session_3B.html>`_,
      `Prédiction d'une durée
      <http://www.xavierdupre.fr/app/papierstat/helpsphinx/notebooks/artificiel_duration_prediction.html>`_

    **Séance 2**

    * `Valeurs manquantes (cheatsheet) <https://www.math.univ-toulouse.fr/~besse/Wikistat/pdf/st-m-app-idm.pdf>`_,
      `Valeurs manquantes (notebooks) <http://www.xavierdupre.fr/app/ensae_teaching_cs/helpsphinx/ml2a/td2a_mlbasic_valeurs_manquantes.html>`_
    * `Réseaux de neurones <http://www.xavierdupre.fr/app/mlstatpy/helpsphinx/c_ml/rn/rn.html>`_
      (inférence, gradient)
    * Apprentissage à base de gradients, différences avec les arbres
	* `Deep learning <http://www.xavierdupre.fr/app/ensae_teaching_dl/helpsphinx/chapters/deep_reseaux_de_neurones_et_deep_learning.html>`_
	* :epkg:`tensorflow`, :epkg:`pytorch`, :epkg:`paddlepaddle`, :epkg:`chainer`
	* GPU ?
	* Zoo de modèles, `paddlepaddle <https://github.com/PaddlePaddle/PaddleClas>`_,
      `TensorFlow Hub <https://www.tensorflow.org/hub>`_,
      `PyTorch Zoo <https://pytorch.org/docs/stable/torchvision/models.html>`_
	* `Transfer Learning <http://www.xavierdupre.fr/app/mlprodict/helpsphinx/notebooks/transfer_learning.html>`_
    * Mise en production... `ONNX <https://onnx.ai/>`_ ?

    **Séance 3**

    * `Série temporelles <http://www.xavierdupre.fr/app/ensae_teaching_cs/helpsphinx/ml2a/td2a_mlplus_timeseries_series_temporelles.html>`_
	* :epkg:`statsmodels`, modèles AR, :epkg:`fbprophet`
	* Prédiction
	* Série liées au COVID
	* Réalisation de cartes
	* Modèles épidémiologiques SIRD

    **Séance 4**

    * Webscrapping
	* JSON, xml
	* Clustering
	* Traitement du texte
	* Catégories (encoding, dirtycat)
    * Texte libre, (tokenisation, encoder, tf-idf, deep learning)

    **Séance 5**

    * Pipeline scikit-learn
	* Étendre scikit-learn avec ses propres modèles
	* Créer un module python
	* Test unitaires
    * Interprétabilité des modèles
	* Modèles éthiques
    * Machine learning crypté
    * Survival analysis

    **Un peu plus de détails sur la séance 1**

    Projet de machine learning
	* Récupérer les données  --> dataframe (pandas.read_csv("...", sep=";"))
	* Features = X + cible y : objectif prédire y
	* Séparer entre données apprentissage (train) et données test (test)
        * Estimer le modèle sur la base train
        * Evaluer sur la base test
    * Validation croisée pour vérifier la robustesse du modèle
    * La classification multiclasse est construire à partir de classifications
      binaires selon deux stratégies : une contre toute (One Versus Rest) ou
      les unes contre les autres (One Versus One)

    Un peu plus en détail :

    * `Courbe ROC <http://www.xavierdupre.fr/app/mlstatpy/helpsphinx/c_metric/roc.html>`_
    * Pour la régression Lasso est utilisée pour sélectionner les variables
      en annulant les coefficients : :ref:`Lasso <2020-02-07sklapirst>`.

    **Un peu plus de détails sur la séance 2**

    * `Valeurs manquantes (cheatsheet)
      <https://www.math.univ-toulouse.fr/~besse/Wikistat/pdf/st-m-app-idm.pdf>`_,
      `Valeurs manquantes (notebooks)
      <http://www.xavierdupre.fr/app/ensae_teaching_cs/helpsphinx/ml2a/td2a_mlbasic_valeurs_manquantes.html>`_ :
        * méthode naïve
        * corrélations
        * `KNNImputer
          <https://scikit-learn.org/stable/modules/generated/sklearn.impute.KNNImputer.html>`_
    * `Réseaux de neurones
      <http://www.xavierdupre.fr/app/mlstatpy/helpsphinx/c_ml/rn/rn.html>`_
      (inférence, gradient)
        * `un neurone <http://www.xavierdupre.fr/app/mlstatpy/helpsphinx/c_ml/rn/rn_1_def.html#un-neurone>`_
        * `réseau de neurones
          <http://www.xavierdupre.fr/app/mlstatpy/helpsphinx/c_ml/rn/rn_1_def.html#un-reseau-de-neurones-le-perceptron>`_
        * propagation (évaluation)
        * `densité
          <http://www.xavierdupre.fr/app/mlstatpy/helpsphinx/c_ml/rn/rn_4_densite.html#densite-des-reseaux-de-neurones>`_
        * `apprentissage à base de gradient
          <http://www.xavierdupre.fr/app/mlstatpy/helpsphinx/c_ml/rn/rn_5_newton.html>`_
        * `rétropropagation
          <http://www.xavierdupre.fr/app/mlstatpy/helpsphinx/c_ml/rn/rn_5_newton.html#calcul-du-gradient-ou-retropropagation>`_
        * `gradient stochastique
          <http://www.xavierdupre.fr/app/mlstatpy/helpsphinx/c_ml/rn/rn_6_apprentissage.html#apprentissage-avec-gradient-stochastique>`_,
          `implémentation
          <http://www.xavierdupre.fr/app/mlstatpy/helpsphinx/mlstatpy/optim/sgd.html#mlstatpy.optim.sgd.SGDOptimizer>`_
        * `régression
          <http://www.xavierdupre.fr/app/mlstatpy/helpsphinx/c_ml/rn/rn_2_reg.html>`_
        * `classification
          <http://www.xavierdupre.fr/app/mlstatpy/helpsphinx/c_ml/rn/rn_7_clas2.html#reseau-de-neurones-adequat>`_
        * normalisation
    * Réseau de neurones ou arbres ?
    * `Gradient Boosting <http://www.xavierdupre.fr/app/ensae_teaching_cs/helpsphinx/notebooks/gradient_boosting.html>`_
	* `Deep learning <http://www.xavierdupre.fr/app/ensae_teaching_dl/helpsphinx/chapters/deep_reseaux_de_neurones_et_deep_learning.html>`_
    * `Réseaux à convolution <https://en.wikipedia.org/wiki/Convolutional_neural_network>`_
	* :epkg:`tensorflow`, :epkg:`pytorch`, :epkg:`paddlepaddle`, :epkg:`chainer`
    * `exemple
      <http://www.xavierdupre.fr/app/ensae_teaching_dl/helpsphinx/all_notebooks.html>`_
    * `Yolo <https://www.youtube.com/watch?v=Cgxsv1riJhI>`_,
      `Yolo explanations <https://www.youtube.com/watch?v=NM6lrxy0bxs>`_
    * `Parler avec les mains à une machine
      <https://research.fb.com/publications/constraining-dense-hand-surface-tracking-with-elasticity/>`_
	* GPU ?
	* Zoo de modèles, `paddlepaddle <https://github.com/PaddlePaddle/PaddleClas>`_,
      `TensorFlow Hub <https://www.tensorflow.org/hub>`_,
      `PyTorch Zoo <https://pytorch.org/docs/stable/torchvision/models.html>`_
	* `Transfer Learning <http://www.xavierdupre.fr/app/mlprodict/helpsphinx/notebooks/transfer_learning.html>`_
    * Mise en production... `docker <https://www.docker.com/>`_,
      `ONNX <https://onnx.ai/>`_ ?
    * attention, transformers

    **Un peu plus de détails sur la séance 3**

    * :ref:`tscovidrst`
    * :ref:`tspredrst`
    * `Single Spectrum Analysis (SSA)
      <http://www.xavierdupre.fr/app/ensae_teaching_cs/helpsphinx/notebooks/timeseries_ssa.html>`_
    * `aftercovid <http://www.xavierdupre.fr/app/aftercovid/helpsphinx/index.html>`_
    * `Algo - simulation COVID
      <http://www.xavierdupre.fr/app/ensae_teaching_cs/helpsphinx/notebooks/2020_covid.html>`_
    * `Evolution d'un voisinage, d'une population
      <http://www.xavierdupre.fr/app/ensae_teaching_cs/helpsphinx/specials/voisinage.html?highlight=propagation_epidemic>`_

    Les données du jour :
    `Chiffres-clés concernant l'épidémie de COVID19 en France
    <https://www.data.gouv.fr/en/datasets/chiffres-cles-concernant-lepidemie-de-covid19-en-france/>`_.

    **Un peu plus de détails sur la séance 4**

    Covid ?

    *scraping*

    * `JSON <https://www.json.org/json-en.html>`_,
      `xml <https://fr.wikipedia.org/wiki/Extensible_Markup_Language>`_
    * `Web-Scraping <http://www.xavierdupre.fr/app/ensae_teaching_cs/helpsphinx/notebooks/TD2A_Eco_Web_Scraping.html>`_,
      `Web-Scraping (correction) <http://www.xavierdupre.fr/app/ensae_teaching_cs/helpsphinx/notebooks/TD2A_Eco_Web_Scraping_corrige.html>`_,
      `Scraping, récupérer une image depuis LeMonde <http://www.xavierdupre.fr/app/ensae_teaching_cs/helpsphinx/notebooks/2018-10-02_scraping_recuperer_images.html>`_
    * cache, javascript
    * blacklist
    * `TOR <https://fr.wikipedia.org/wiki/Tor_(r%C3%A9seau)>`_
    * `wikipedia dump <https://dumps.wikimedia.org/>`_
    * `expression régulière <https://docs.python.org/3/library/re.html>`_

    *API REST*

    * `API REST <http://www.xavierdupre.fr/app/ensae_teaching_cs/helpsphinx/notebooks/TD2A_eco_les_API.html>`_,
      `API REST SNCF <http://www.xavierdupre.fr/app/ensae_teaching_cs/helpsphinx/notebooks/TD2A_eco_API_SNCF.html>`_,
      `API REST SNCF (corrigé) <http://www.xavierdupre.fr/app/ensae_teaching_cs/helpsphinx/notebooks/TD2A_eco_API_SNCF_corrige.html>`_
    * `Créer sa propre API REST <http://www.xavierdupre.fr/app/ensae_teaching_cs/helpsphinx/notebooks/2020_rest.html>`_

    *Catégories*

    * `OneHotEncoder <https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html>`_
    * `category_encoders <https://contrib.scikit-learn.org/category_encoders/>`_
    * 1 colonne par catégorie ça fait beaucoup de colonne
    * pourquoi encoder --> ordre des catégories
    * `dirty_cat <https://dirty-cat.github.io/stable/>`_

    *Texte libre*

    * `Introduction au text mining <http://www.xavierdupre.fr/app/ensae_teaching_cs/helpsphinx/notebooks/td2a_eco_NLP_tf_idf_ngrams_LDA_word2vec_sur_des_extraits_litteraires.html>`_
    * `Analyse de sentiments <http://www.xavierdupre.fr/app/ensae_teaching_cs/helpsphinx/notebooks/td2a_sentiment_analysis.html>`_,
      `Analyse de sentiments - correction <http://www.xavierdupre.fr/app/ensae_teaching_cs/helpsphinx/notebooks/td2a_sentiment_analysis_correction.html>`_
    * bag of words (sac de mots), tokenisation, stop-words, TD-IDF, n-grammes
    * word embedding (word2vec, gloves)
    * `auto-encoders <https://en.wikipedia.org/wiki/Autoencoder>`_
    * `HuggingFace <https://huggingface.co/>`_,
      `hugging face transformers <https://huggingface.co/transformers/>`_,
      `huggingface github <https://github.com/huggingface/transformers>`_
    * `spacy <https://spacy.io/>`_,
      `nltk <https://www.nltk.org/>`_,
      `gensim <https://radimrehurek.com/gensim/auto_examples/index.html>`_

    *topics*

    * `LDA <https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation>`_

    *devinette*

    Un chalutier mesure la taille de tous les poissons pris.
    Comment estimer la proportion de mâles et femelles sachant
    qu'ils n'ont en moyenne pas la même taille ?

    **Un peu plus de détails sur la séance 5**

    *scikit-learn API*

    Construire un modèle qui effectue
    une rotation des features avant un arbre de décision,
    voir `API de sciki-learn et modèles customisés
    <http://www.xavierdupre.fr/app/jupytalk/helpsphinx/notebooks/sklearn_api.html>`_,
    `FunctionTransformer
    <https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.FunctionTransformer.html>`_.

    *Analyse de survie*

    `Analyse de survie
    <http://www.xavierdupre.fr/app/mlstatpy/helpsphinx/c_ml/survival_analysis.html>`_

    *Interprétabilité* (voir `Interpretable Machine Learning
    <https://christophm.github.io/interpretable-ml-book/>`_)

    * `Partial Dependence and Individual Conditional Expectation plots
      <https://scikit-learn.org/stable/modules/partial_dependence.html>`_,
      `SOME GENERAL THOUGHTS ON PARTIAL DEPENDENCE PLOTS WITH CORRELATED COVARIATES
      <https://freakonometrics.hypotheses.org/61737>`_
    * `LIME (Local Interpretable Model-Agnostic Explanations) <https://arxiv.org/pdf/1602.04938v1.pdf>`_
        * Approximer localement la prédiction d'un modèle par un modèle interprétable (comme une régression Lasso),
          cela revient en quelque sorte à calculer le gradient du modèle en chaque feature pour un point donnée.
        * .. image:: lime.png
    	* Simplifier l'analyse en groupant les variables (pixels) si trop de variables
    * `SHAP (SHapley Additive exPlanations) <https://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf>`_
		* Value d'une variable : on calcule l'espérance de la prédiction
          en tirant aléatoirement des valeurs pour cette variable (loi marginal),
          on fait la différence avec la prédiction moyenne.
        * Lire `Shapley Values <https://christophm.github.io/interpretable-ml-book/shapley.html>`_
		* .. image:: shap.png
		* .. image:: shap2.png
    * `CounterFactual <https://christophm.github.io/interpretable-ml-book/counterfactual.html>`_
		* Dérivées partielles
        * La prédiction est *Y*, on souhaite *Z*,
          quelle est le plus petit changement dans *X* pour avoir *Z* ?
    * Cas des variables corrélées

    *Ethique*

    * `Machine Learning éthique
      <http://www.xavierdupre.fr/app/ensae_teaching_cs/helpsphinx/ml2a/td2a_mlplus_machine_learning_ethique.html>`_
    * `fairlearn <https://github.com/fairlearn/fairlearn>`_
    * variable de contrôle