.. _winesmultistackingrst:

=======================================
Classification multi-classe et stacking
=======================================


.. only:: html

    **Links:** :download:`notebook <wines_multi_stacking.ipynb>`, :downloadlink:`html <wines_multi_stacking2html.html>`, :download:`PDF <wines_multi_stacking.pdf>`, :download:`python <wines_multi_stacking.py>`, :downloadlink:`slides <wines_multi_stacking.slides.html>`, :githublink:`GitHub|_doc/notebooks/lectures/wines_multi_stacking.ipynb|*`


On cherche à prédire la note d’un vin avec un classifieur multi-classe
puis à améliorer le score obtenu avec une méthode dite de
`stacking <https://www.quora.com/What-is-stacking-in-machine-learning>`__.

.. code:: ipython3

    from jyquickhelper import add_notebook_menu
    add_notebook_menu()


.. contents::
    :local:


Le problème
-----------

Il n’est pas évident que les scores des différents modèles qu’on apprend
sur chacun des classes soient comparables. Si le modèle n’est pas assez
performant, on peut songer à ajouter un dernier modèle qui prend la
décision finale en fonction du résultat de chaque modèle.

.. code:: ipython3

    from pyquickhelper.helpgen import NbImage
    NbImage('images/stackmulti.png', width=400)


.. image:: wines_multi_stacking_3_0.png
   :width: 400px


.. code:: ipython3

    %matplotlib inline

.. code:: ipython3

    from papierstat.datasets import load_wines_dataset
    df = load_wines_dataset()
    X = df.drop(['quality', 'color'], axis=1)
    y = df['quality']

.. code:: ipython3

    from sklearn.model_selection import train_test_split
    X_train, X_test, y_train, y_test = train_test_split(X, y)

.. code:: ipython3

    from sklearn.linear_model import LogisticRegression
    from sklearn.multiclass import OneVsRestClassifier
    clr = OneVsRestClassifier(LogisticRegression(max_iter=1500))
    clr.fit(X_train, y_train)


.. parsed-literal::
    OneVsRestClassifier(estimator=LogisticRegression(C=1.0, class_weight=None,
                                                     dual=False, fit_intercept=True,
                                                     intercept_scaling=1,
                                                     l1_ratio=None, max_iter=1500,
                                                     multi_class='auto',
                                                     n_jobs=None, penalty='l2',
                                                     random_state=None,
                                                     solver='lbfgs', tol=0.0001,
                                                     verbose=0, warm_start=False),
                        n_jobs=None)


.. code:: ipython3

    import numpy
    numpy.mean(clr.predict(X_test).ravel() == y_test.ravel()) * 100


.. parsed-literal::
    53.907692307692315


On regarde la matrice de confusion.

.. code:: ipython3

    from sklearn.metrics import confusion_matrix
    import pandas
    df = pandas.DataFrame(confusion_matrix(y_test, clr.predict(X_test)))
    try:
        df.columns = [str(_) for _ in clr.classes_][:df.shape[1]]
        df.index = [str(_) for _ in clr.classes_][:df.shape[0]]
    except ValueError:
        # Il peut arriver qu'une classe ne soit pas représenter
        # lors de l'apprentissage
        print("erreur", df.shape, clr.classes_)
    df


.. raw:: html

    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>3</th>
          <th>4</th>
          <th>5</th>
          <th>6</th>
          <th>7</th>
          <th>8</th>
          <th>9</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>3</th>
          <td>0</td>
          <td>0</td>
          <td>2</td>
          <td>1</td>
          <td>0</td>
          <td>1</td>
          <td>0</td>
        </tr>
        <tr>
          <th>4</th>
          <td>0</td>
          <td>0</td>
          <td>39</td>
          <td>15</td>
          <td>1</td>
          <td>0</td>
          <td>0</td>
        </tr>
        <tr>
          <th>5</th>
          <td>0</td>
          <td>0</td>
          <td>314</td>
          <td>201</td>
          <td>2</td>
          <td>0</td>
          <td>0</td>
        </tr>
        <tr>
          <th>6</th>
          <td>0</td>
          <td>0</td>
          <td>170</td>
          <td>537</td>
          <td>9</td>
          <td>0</td>
          <td>0</td>
        </tr>
        <tr>
          <th>7</th>
          <td>0</td>
          <td>0</td>
          <td>17</td>
          <td>233</td>
          <td>25</td>
          <td>0</td>
          <td>0</td>
        </tr>
        <tr>
          <th>8</th>
          <td>0</td>
          <td>0</td>
          <td>2</td>
          <td>47</td>
          <td>6</td>
          <td>0</td>
          <td>0</td>
        </tr>
        <tr>
          <th>9</th>
          <td>0</td>
          <td>0</td>
          <td>0</td>
          <td>2</td>
          <td>1</td>
          <td>0</td>
          <td>0</td>
        </tr>
      </tbody>
    </table>
    </div>


On cale d’abord une random forest sur les données brutes.

.. code:: ipython3

    from sklearn.ensemble import RandomForestClassifier
    rfc = RandomForestClassifier()
    rfc.fit(X_train, y_train)
    numpy.mean(rfc.predict(X_test).ravel() == y_test.ravel()) * 100


.. parsed-literal::
    67.44615384615385


On cale une random forest avec les sorties de la régression logistique.

.. code:: ipython3

    rf_train = clr.decision_function(X_train)
    
    rfc_y = RandomForestClassifier()
    rfc_y.fit(rf_train, y_train)


.. parsed-literal::
    RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                           criterion='gini', max_depth=None, max_features='auto',
                           max_leaf_nodes=None, max_samples=None,
                           min_impurity_decrease=0.0, min_impurity_split=None,
                           min_samples_leaf=1, min_samples_split=2,
                           min_weight_fraction_leaf=0.0, n_estimators=100,
                           n_jobs=None, oob_score=False, random_state=None,
                           verbose=0, warm_start=False)


On calcule le taux d’erreur.

.. code:: ipython3

    rf_test = clr.decision_function(X_test)
    numpy.mean(rfc_y.predict(rf_test).ravel() == y_test.ravel()) * 100


.. parsed-literal::
    64.8


C’est presque équivalent à une random forest calée sur les données
brutes. On trace les courbes ROC pour la classe 4.

.. code:: ipython3

    from sklearn.metrics import roc_curve, roc_auc_score
    fpr_lr, tpr_lr, th_lr = roc_curve(y_test == 4, clr.decision_function(X_test)[:, 2])
    fpr_rfc, tpr_rfc, th_rfc = roc_curve(y_test == 4, rfc.predict_proba(X_test)[:, 2])
    fpr_rfc_y, tpr_rfc_y, th_rfc_y = roc_curve(y_test == 4, rfc_y.predict_proba(rf_test)[:, 2])
    auc_lr = roc_auc_score(y_test == 4, clr.decision_function(X_test)[:, 2])
    auc_rfc = roc_auc_score(y_test == 4, rfc.predict_proba(X_test)[:, 2])
    auc_rfc_y = roc_auc_score(y_test == 4, rfc_y.predict_proba(rf_test)[:, 2])
    auc_lr, auc_rfc, auc_rfc_y


.. parsed-literal::
    (0.7485466126230457, 0.6752634626519977, 0.6984597568037059)


.. code:: ipython3

    import matplotlib.pyplot as plt
    fig, ax = plt.subplots(1, 1, figsize=(4,4))
    ax.plot([0, 1], [0, 1], 'k--')
    ax.plot(fpr_lr, tpr_lr, label="OneVsRest + LR")
    ax.plot(fpr_rfc, tpr_rfc, label="RF")
    ax.plot(fpr_rfc_y, tpr_rfc_y, label="OneVsRest + LR + RF")
    ax.set_title('Courbe ROC - comparaison de deux\nmodèles pour la classe 4')
    ax.legend();


.. image:: wines_multi_stacking_19_0.png


La courbe ROC ne montre rien de probant. Il faudrait vérifier avec une
cross-validation qu’il serait pratique de faire avec un
`pipeline <http://scikit-learn.org/stable/modules/pipeline.html>`__ mais
ceux-ci n’acceptent qu’un seul prédicteur final.

.. code:: ipython3

    from sklearn.pipeline import make_pipeline
    try:
        pipe = make_pipeline(OneVsRestClassifier(LogisticRegression(max_iter=1500)),
                             RandomForestClassifier())
    except Exception as e:
        print('ERREUR :')
        print(e)


.. parsed-literal::
    ERREUR :
    All intermediate steps should be transformers and implement fit and transform or be the string 'passthrough' 'OneVsRestClassifier(estimator=LogisticRegression(C=1.0, class_weight=None,
                                                     dual=False, fit_intercept=True,
                                                     intercept_scaling=1,
                                                     l1_ratio=None, max_iter=1500,
                                                     multi_class='auto',
                                                     n_jobs=None, penalty='l2',
                                                     random_state=None,
                                                     solver='lbfgs', tol=0.0001,
                                                     verbose=0, warm_start=False),
                        n_jobs=None)' (type <class 'sklearn.multiclass.OneVsRestClassifier'>) doesn't


On construit une ROC sur toutes les classes.

.. code:: ipython3

    fpr_lr, tpr_lr, th_lr = roc_curve(y_test == clr.predict(X_test), 
                                      clr.predict_proba(X_test).max(axis=1), drop_intermediate=False)
    fpr_rfc, tpr_rfc, th_rfc = roc_curve(y_test == rfc.predict(X_test), 
                                      rfc.predict_proba(X_test).max(axis=1), drop_intermediate=False)
    fpr_rfc_y, tpr_rfc_y, th_rfc_y = roc_curve(y_test == rfc_y.predict(rf_test), 
                                         rfc_y.predict_proba(rf_test).max(axis=1), drop_intermediate=False)
    auc_lr = roc_auc_score(y_test == clr.predict(X_test), 
                           clr.decision_function(X_test).max(axis=1))
    auc_rfc = roc_auc_score(y_test == rfc.predict(X_test), 
                            rfc.predict_proba(X_test).max(axis=1))
    auc_rfc_y = roc_auc_score(y_test == rfc_y.predict(rf_test), 
                              rfc_y.predict_proba(rf_test).max(axis=1))
    auc_lr, auc_rfc, auc_rfc_y


.. parsed-literal::
    (0.5510406569489914, 0.753562533633215, 0.7354245943989535)


.. code:: ipython3

    import matplotlib.pyplot as plt
    fig, ax = plt.subplots(1, 1, figsize=(4,4))
    ax.plot([0, 1], [0, 1], 'k--')
    ax.plot(fpr_lr, tpr_lr, label="OneVsRest + LR")
    ax.plot(fpr_rfc, tpr_rfc, label="RF")
    ax.plot(fpr_rfc_y, tpr_rfc_y, label="OneVsRest + LR + RF")
    ax.set_title('Courbe ROC - comparaison de deux\nmodèles pour toutes les classes')
    ax.legend();


.. image:: wines_multi_stacking_24_0.png


Sur ce modèle, le score produit par le classifieur final paraît plus
partinent que le score obtenu en prenant le score maximum sur toutes les
classes. On tente une dernière approche où le modèle final doit valider
ou non la réponse : c’est un classifieur binaire. Avec celui-ci, tous
les classifieurs estimés sont binaires.

.. code:: ipython3

    rf_train_bin = clr.decision_function(X_train)
    y_train_bin = clr.predict(X_train) == y_train
    rfc = RandomForestClassifier()
    rfc.fit(rf_train_bin, y_train_bin)


.. parsed-literal::
    RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                           criterion='gini', max_depth=None, max_features='auto',
                           max_leaf_nodes=None, max_samples=None,
                           min_impurity_decrease=0.0, min_impurity_split=None,
                           min_samples_leaf=1, min_samples_split=2,
                           min_weight_fraction_leaf=0.0, n_estimators=100,
                           n_jobs=None, oob_score=False, random_state=None,
                           verbose=0, warm_start=False)


On regarde les premières réponses.

.. code:: ipython3

    rf_test_bin = clr.decision_function(X_test)
    rfc.predict_proba(rf_test_bin)[:3]


.. parsed-literal::
    array([[0.32, 0.68],
           [0.5 , 0.5 ],
           [0.89, 0.11]])


.. code:: ipython3

    y_test_bin = clr.predict(X_test) == y_test

.. code:: ipython3

    fpr_rfc_bin, tpr_rfc_bin, th_rfc_bin = roc_curve(y_test_bin, rfc.predict_proba(rf_test_bin)[:, 1])
    auc_rfc_bin = roc_auc_score(y_test_bin, rfc.predict_proba(rf_test_bin)[:, 1])
    auc_lr, auc_rfc_bin


.. parsed-literal::
    (0.5510406569489914, 0.7668718108162481)


.. code:: ipython3

    import matplotlib.pyplot as plt
    fig, ax = plt.subplots(1, 1, figsize=(4,4))
    ax.plot([0, 1], [0, 1], 'k--')
    ax.plot(fpr_lr, tpr_lr, label="OneVsRest + LR")
    ax.plot(fpr_rfc, tpr_rfc, label="OneVsRest + LR + RF")
    ax.plot(fpr_rfc_bin, tpr_rfc_bin, label="OneVsRest + LR + RF binaire")
    ax.set_title('Courbe ROC - comparaison de deux\nmodèles pour toutes les classes')
    ax.legend();


.. image:: wines_multi_stacking_31_0.png


Un peu mieux mais il faudrait encore valider avec une validation croisée
et plusieurs jeux de données, y compris artificiels. Il reste néanmoins
l’idée.

Automatisation avec une implémentation
--------------------------------------

Comme c’est fastidieux de faire tout cela, on implémente une classe qui
convertit un modèle de machine learning en un *transform* qu’on peut
insérer dans un pipeline.

.. code:: ipython3

    from mlinsights.sklapi import SkBaseTransformStacking
    model = make_pipeline(
                    SkBaseTransformStacking(
                        [OneVsRestClassifier(LogisticRegression(max_iter=1500))], 
                        'decision_function'), 
                    RandomForestClassifier())
    model.fit(X_train, y_train)


.. parsed-literal::
    Pipeline(memory=None,
             steps=[('skbasetransformstacking',
                     SkBaseTransformStacking([OneVsRestClassifier(estimator=LogisticRegress
        ion(C=1.0, class_weight=None,
        dual=False, fit_intercept=True,
        intercept_scaling=1,
        l1_ratio=None, max_iter=1500,
        multi_class='auto',
        n_jobs=None, penalty='l2',
        random_state=None,
        solver='lbfgs', tol=0.0001,
        verbose=0, warm_start=False),                     n_j...
                     RandomForestClassifier(bootstrap=True, ccp_alpha=0.0,
                                            class_weight=None, criterion='gini',
                                            max_depth=None, max_features='auto',
                                            max_leaf_nodes=None, max_samples=None,
                                            min_impurity_decrease=0.0,
                                            min_impurity_split=None,
                                            min_samples_leaf=1, min_samples_split=2,
                                            min_weight_fraction_leaf=0.0,
                                            n_estimators=100, n_jobs=None,
                                            oob_score=False, random_state=None,
                                            verbose=0, warm_start=False))],
             verbose=False)


.. code:: ipython3

    fpr_pipe, tpr_pipe, th_pipe = roc_curve(y_test == model.predict(X_test), 
                                            model.predict_proba(X_test).max(axis=1),
                                            drop_intermediate=False)
    auc_pipe = roc_auc_score(y_test == model.predict(X_test),
                             model.predict_proba(X_test).max(axis=1))
    auc_pipe


.. parsed-literal::
    0.7330188804547779


On n’oublie pas de mélanger les données avant de faire tourner la
validation croisée.

.. code:: ipython3

    df = load_wines_dataset(shuffle=True)
    X = df.drop(['quality', 'color'], axis=1)
    y = df['quality']

On retrouve les mêmes résultats mais on peut maintenant faire une
validation croisée.

.. code:: ipython3

    from sklearn.model_selection import cross_val_score
    from sklearn.model_selection import cross_validate
    cross_val_score(model, X, y, cv=5)


.. parsed-literal::
    array([0.66923077, 0.66153846, 0.67513472, 0.66897614, 0.65588915])


scikit-learn 0.22 - stacking
----------------------------

A partir de la version 0.22, *scikit-learn* a introduit le modèle
`StackingClassifier <https://scikit-learn.org/dev/modules/generated/sklearn.ensemble.StackingClassifier.html#sklearn.ensemble.StackingClassifier>`__
avec un design un peu différent

.. code:: ipython3

    try:
        from sklearn.ensemble import StackingClassifier
        skl = True
    except ImportError:
        # scikit-learn pas assez récent
        skl = False
    if skl:
        model = StackingClassifier([
            ('ovrlr', OneVsRestClassifier(LogisticRegression(max_iter=1500))),
            ('rf', RandomForestClassifier())
        ])
        model.fit(X_train, y_train)
    else:
        model = None
    model


.. parsed-literal::
    C:\xavierdupre\__home_\github_fork\scikit-learn\sklearn\model_selection\_split.py:667: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=5.
      % (min_groups, self.n_splits)), UserWarning)
    C:\xavierdupre\__home_\github_fork\scikit-learn\sklearn\model_selection\_split.py:667: UserWarning: The least populated class in y has only 2 members, which is less than n_splits=5.
      % (min_groups, self.n_splits)), UserWarning)
    C:\xavierdupre\__home_\github_fork\scikit-learn\sklearn\linear_model\_logistic.py:939: ConvergenceWarning: lbfgs failed to converge (status=1):
    STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
    Increase the number of iterations (max_iter) or scale the data as shown in:
        https://scikit-learn.org/stable/modules/preprocessing.html.
    Please also refer to the documentation for alternative solver options:
        https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
      extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)


.. parsed-literal::

    StackingClassifier(cv=None,
                       estimators=[('ovrlr',
                                    OneVsRestClassifier(estimator=LogisticRegression(C=1.0,
                                                                                     class_weight=None,
                                                                                     dual=False,
                                                                                     fit_intercept=True,
                                                                                     intercept_scaling=1,
                                                                                     l1_ratio=None,
                                                                                     max_iter=1500,
                                                                                     multi_class='auto',
                                                                                     n_jobs=None,
                                                                                     penalty='l2',
                                                                                     random_state=None,
                                                                                     solver='lbfgs',
                                                                                     tol=0.0001,
                                                                                     verbose=0,
                                                                                     warm_start=False),
                                                        n_jobs=None)),
                                   ('rf',
                                    RandomForestCla...
                                                           max_depth=None,
                                                           max_features='auto',
                                                           max_leaf_nodes=None,
                                                           max_samples=None,
                                                           min_impurity_decrease=0.0,
                                                           min_impurity_split=None,
                                                           min_samples_leaf=1,
                                                           min_samples_split=2,
                                                           min_weight_fraction_leaf=0.0,
                                                           n_estimators=100,
                                                           n_jobs=None,
                                                           oob_score=False,
                                                           random_state=None,
                                                           verbose=0,
                                                           warm_start=False))],
                       final_estimator=None, n_jobs=None, passthrough=False,
                       stack_method='auto', verbose=0)


.. code:: ipython3

    if model is not None:
        fpr_pipe, tpr_pipe, th_pipe = roc_curve(y_test == model.predict(X_test),
                                                model.predict_proba(X_test).max(axis=1),
                                                drop_intermediate=False)
        auc_pipe = roc_auc_score(y_test == model.predict(X_test),
                                 model.predict_proba(X_test).max(axis=1))
    else:
        auc_pipe = None
    auc_pipe


.. parsed-literal::
    0.7301920159931777
	5	6	7	8
3	2	1	0	1
4	39	15	1	0
5	314	201	2	0
6	170	537	9	0
7	17	233	25	0
8	2	47	6	0
9	0	2	1	0