.. _winesregpolyrst: ================================== Régression polynômiale et pipeline ================================== .. only:: html **Links:** :download:`notebook `, :downloadlink:`html `, :download:`PDF `, :download:`python `, :downloadlink:`slides `, :githublink:`GitHub|_doc/notebooks/lectures/wines_reg_poly.ipynb|*` Le notebook compare plusieurs de modèles de régression polynômiale. .. code:: ipython3 %matplotlib inline .. code:: ipython3 from papierstat.datasets import load_wines_dataset data = load_wines_dataset() X = data.drop(['quality', 'color'], axis=1) y = data['quality'] .. code:: ipython3 from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y) On normalise les données. Pour ce cas particulier, c’est d’autant plus important que les polynômes prendront de très grandes valeurs si cela n’est pas fait et les librairies de calculs n’aiment pas les ordres de grandeurs trop différents. .. code:: ipython3 from sklearn.preprocessing import Normalizer norm = Normalizer() X_train_norm = norm.fit_transform(X_train) X_test_norm = norm.transform(X_test) La transformation `PolynomialFeatures `__ créée de nouvelles features en multipliant les variables les unes avec les autres. Pour le degré deux et trois features :math:`a, b, c`, on obtient les nouvelles features : :math:`1, a, b, c, a^2, ab, ac, b^2, bc, c^2`. .. code:: ipython3 from time import perf_counter from sklearn.linear_model import LinearRegression from sklearn.preprocessing import PolynomialFeatures from sklearn.pipeline import make_pipeline from sklearn.metrics import r2_score r2ts = [] r2es = [] degs = [] tts = [] models = [] for d in range(1, 5): begin = perf_counter () pipe = make_pipeline(PolynomialFeatures(degree=d), LinearRegression()) pipe.fit(X_train_norm, y_train) duree = perf_counter () - begin r2t = r2_score(y_train, pipe.predict(X_train_norm)) r2e = r2_score(y_test, pipe.predict(X_test_norm)) degs.append(d) r2ts.append(r2t) r2es.append(r2e) tts.append(duree) models.append(pipe) print(d, r2t, r2e, duree) .. parsed-literal:: 1 0.189007413643138 0.17548948727814861 0.005909326000001158 2 0.3090044704138045 0.3016856760353912 0.027130041999996024 3 0.4065060987061494 -0.057880204420430736 0.22084438099999915 4 0.5874526458338967 -3659.6472584680923 2.230189553999999 .. code:: ipython3 import pandas df = pandas.DataFrame(dict(temps=tts, r2_train=r2ts, r2_test=r2es, degré=degs)) df.set_index('degré') .. raw:: html
temps r2_train r2_test
degré
1 0.005909 0.189007 0.175489
2 0.027130 0.309004 0.301686
3 0.220844 0.406506 -0.057880
4 2.230190 0.587453 -3659.647258
Le polynômes de degré 2 paraît le meilleur modèle. Le temps de calcul est multiplié par 10 à chaque fois, ce qui correspond au nombre de features. On voit néanmoins que l’ajout de features croisée fonctionne sur ce jeu de données. Mais au delà de 3, la régression produit des résultats très mauvais sur la base de test alors qu’ils continuent d’augmenter sur la base d’apprentissage. Voyons cela un peu plus en détail. .. code:: ipython3 import matplotlib.pyplot as plt fig, ax = plt.subplots(1, 2, figsize=(12, 4)) n = 15 ax[0].plot(y_train[:n].reset_index(), 'o') ax[1].plot(y_test[:n].reset_index(), 'o') ax[0].set_title('Prédictions sur quelques valeurs\napprentissage') ax[1].set_title('Prédictions sur quelques valeurs\ntest') for x in ax: x.set_ylim([3, 9]) x.get_xaxis().set_visible(False) for model in models: d = model.get_params()['polynomialfeatures__degree'] tr = model.predict(X_train_norm[:n]) te = model.predict(X_test_norm[:n]) ax[0].plot(tr, label="d=%d" % d) ax[1].plot(te, label="d=%d" % d) ax[0].legend() ax[1].legend(); .. image:: wines_reg_poly_10_0.png Le modèle de degré 4 a l’air performant sur la base d’apprentissage mais s’égare complètement sur la base de test comme s’il était surpris des valeurs rencontrées sur la base de test. On dit que le modèle fait du `sur-apprentissage `__ ou `overfitting `__ en anglais. Le polynôme de degré fonctionne mieux que la régression linéaire simple. On peut se demander quelles sont les variables croisées qui ont un impact sur la performance. On utilise le modèle `statsmodels `__. .. code:: ipython3 poly = PolynomialFeatures(degree=2) poly_feat_train = poly.fit_transform(X_train_norm) poly_feat_test = poly.fit_transform(X_test_norm) .. code:: ipython3 from statsmodels.regression.linear_model import OLS model = OLS(y_train, poly_feat_train) results = model.fit() results.summary2() .. raw:: html
Model: OLS Adj. R-squared: 0.302
Dependent Variable: quality AIC: 10821.7528
Date: 2018-09-09 14:59 BIC: 11321.5798
No. Observations: 4872 Log-Likelihood: -5333.9
Df Model: 76 F-statistic: 28.70
Df Residuals: 4795 Prob (F-statistic): 0.00
R-squared: 0.313 Scale: 0.53135
Coef. Std.Err. t P>|t| [0.025 0.975]
const -1062.0749 1873.8933 -0.5668 0.5709 -4735.7657 2611.6159
x1 -2.0670 24.3092 -0.0850 0.9322 -49.7241 45.5901
x2 -729.2270 157.0307 -4.6439 0.0000 -1037.0792 -421.3748
x3 8.7231 200.5623 0.0435 0.9653 -384.4709 401.9171
x4 3.4131 13.3672 0.2553 0.7985 -22.7927 29.6188
x5 -1171.6645 689.5425 -1.6992 0.0893 -2523.4842 180.1551
x6 40.1316 8.1636 4.9159 0.0000 24.1272 56.1360
x7 70.8825 22.2599 3.1843 0.0015 27.2429 114.5221
x8 -724.4624 724.6583 -0.9997 0.3175 -2145.1251 696.2003
x9 -251.7727 192.4538 -1.3082 0.1909 -629.0704 125.5250
x10 -276.1104 163.5189 -1.6886 0.0914 -596.6824 44.4616
x11 258.8220 24.9389 10.3782 0.0000 209.9303 307.7138
x12 1021.3730 1866.5461 0.5472 0.5843 -2637.9138 4680.6598
x13 394.9815 155.7356 2.5362 0.0112 89.6682 700.2947
x14 250.5746 208.6039 1.2012 0.2297 -158.3848 659.5340
x15 -4.4734 21.4718 -0.2083 0.8350 -46.5681 37.6213
x16 -829.1409 537.0924 -1.5438 0.1227 -1882.0886 223.8067
x17 -7.9049 9.5287 -0.8296 0.4068 -26.5856 10.7758
x18 5.7595 21.3304 0.2700 0.7872 -36.0579 47.5770
x19 375.0969 1075.4182 0.3488 0.7273 -1733.2162 2483.4100
x20 114.3253 261.5978 0.4370 0.6621 -398.5264 627.1770
x21 -134.9056 151.9652 -0.8877 0.3747 -432.8272 163.0159
x22 -72.6985 26.4607 -2.7474 0.0060 -124.5735 -20.8234
x23 520.0041 1956.5000 0.2658 0.7904 -3315.6336 4355.6418
x24 -325.9049 1598.8675 -0.2038 0.8385 -3460.4188 2808.6090
x25 -231.1519 129.0113 -1.7917 0.0732 -484.0732 21.7694
x26 4345.4355 4193.6897 1.0362 0.3002 -3876.1206 12566.9917
x27 169.8998 63.4267 2.6787 0.0074 45.5543 294.2453
x28 542.4856 138.8860 3.9060 0.0001 270.2053 814.7658
x29 -8778.6169 5436.2565 -1.6148 0.1064 -19436.1739 1878.9402
x30 1809.6247 1561.8380 1.1587 0.2467 -1252.2945 4871.5438
x31 -2415.5578 1236.4205 -1.9537 0.0508 -4839.5094 8.3938
x32 749.0360 191.8871 3.9035 0.0001 372.8492 1125.2228
x33 -566.9825 2058.9378 -0.2754 0.7830 -4603.4454 3469.4803
x34 -107.1104 167.6736 -0.6388 0.5230 -435.8275 221.6068
x35 3906.7743 5806.8281 0.6728 0.5011 -7477.2732 15290.8218
x36 10.0617 81.2204 0.1239 0.9014 -149.1677 169.2910
x37 -24.5923 177.8119 -0.1383 0.8900 -373.1851 324.0006
x38 743.3110 8266.7074 0.0899 0.9284 -15463.2288 16949.8507
x39 -2927.3589 2162.4656 -1.3537 0.1759 -7166.7837 1312.0660
x40 1943.9211 1604.4419 1.2116 0.2257 -1201.5212 5089.3633
x41 604.3677 250.6121 2.4116 0.0159 113.0530 1095.6823
x42 1020.5273 1873.8778 0.5446 0.5860 -2653.1330 4694.1876
x43 1250.7078 760.6158 1.6443 0.1002 -240.4481 2741.8637
x44 -10.1449 4.8977 -2.0714 0.0384 -19.7466 -0.5432
x45 2.2443 12.2455 0.1833 0.8546 -21.7624 26.2511
x46 616.4495 725.3393 0.8499 0.3954 -805.5484 2038.4474
x47 -134.1735 203.3448 -0.6598 0.5094 -532.8226 264.4757
x48 -123.4359 167.3097 -0.7378 0.4607 -451.4397 204.5678
x49 -4.4518 20.8884 -0.2131 0.8312 -45.4026 36.4990
x50 4274.1441 6764.6180 0.6318 0.5275 -8987.6111 17535.8993
x51 -361.0029 283.6415 -1.2727 0.2032 -917.0705 195.0647
x52 1132.4796 607.3146 1.8647 0.0623 -58.1356 2323.0948
x53 32386.1105 22852.8882 1.4172 0.1565 -12416.0364 77188.2575
x54 -9169.2287 6043.2328 -1.5173 0.1293 -21016.7378 2678.2804
x55 -3935.3569 4533.8479 -0.8680 0.3854 -12823.7790 4953.0653
x56 1044.0159 730.5207 1.4291 0.1530 -388.1398 2476.1716
x57 1011.9144 1874.0030 0.5400 0.5892 -2661.9913 4685.8201
x58 -31.4356 7.2561 -4.3323 0.0000 -45.6610 -17.2103
x59 527.6354 299.1980 1.7635 0.0779 -58.9301 1114.2008
x60 -35.3657 79.2158 -0.4464 0.6553 -190.6650 119.9336
x61 77.6789 65.5638 1.1848 0.2362 -50.8562 206.2139
x62 -57.8306 9.2836 -6.2293 0.0000 -76.0308 -39.6304
x63 995.5783 1874.0603 0.5312 0.5953 -2678.4398 4669.5964
x64 28.6082 645.1442 0.0443 0.9646 -1236.1706 1293.3869
x65 296.2682 172.0530 1.7220 0.0851 -41.0347 633.5711
x66 296.9037 146.6463 2.0246 0.0430 9.4096 584.3978
x67 -196.1275 22.5097 -8.7130 0.0000 -240.2568 -151.9981
x68 -6153.6885 18362.8686 -0.3351 0.7376 -42153.3367 29845.9598
x69 14642.0880 9841.6551 1.4878 0.1369 -4652.0717 33936.2477
x70 -1516.3079 5845.7221 -0.2594 0.7953 -12976.6055 9943.9897
x71 -2197.1628 981.3622 -2.2389 0.0252 -4121.0830 -273.2427
x72 -2425.8554 1193.3038 -2.0329 0.0421 -4765.2784 -86.4324
x73 1641.2450 1566.4224 1.0478 0.2948 -1429.6615 4712.1516
x74 725.5301 253.2564 2.8648 0.0042 229.0313 1222.0290
x75 -1657.2401 2016.9159 -0.8217 0.4113 -5611.3208 2296.8406
x76 398.5591 200.1289 1.9915 0.0465 6.2146 790.9036
x77 898.1505 1871.4360 0.4799 0.6313 -2770.7228 4567.0237
Omnibus: 85.458 Durbin-Watson: 1.991
Prob(Omnibus): 0.000 Jarque-Bera (JB): 148.619
Skew: 0.132 Prob(JB): 0.000
Kurtosis: 3.814 Condition No.: 4493323340279420
Ce n’est pas très lisible. Il faut ajouter le nom de chaque variable et recommencer. .. code:: ipython3 names = poly.get_feature_names(input_features=data.columns[:-2]) names = [n.replace(" ", " * ") for n in names] pft = pandas.DataFrame(poly_feat_train, columns=names) pft.head() .. raw:: html
1 fixed_acidity volatile_acidity citric_acid residual_sugar chlorides free_sulfur_dioxide total_sulfur_dioxide density pH ... density^2 density * pH density * sulphates density * alcohol pH^2 pH * sulphates pH * alcohol sulphates^2 sulphates * alcohol alcohol^2
0 1.0 0.034535 0.001889 0.002752 0.042089 0.000297 0.285993 0.955108 0.005369 0.016836 ... 0.000029 0.000090 0.000013 0.000278 0.000283 0.000041 0.000872 0.000006 0.000126 0.002683
1 1.0 0.056514 0.002868 0.003627 0.013496 0.000346 0.244614 0.961585 0.008352 0.027245 ... 0.000070 0.000228 0.000031 0.000888 0.000742 0.000101 0.002896 0.000014 0.000394 0.011296
2 1.0 0.062938 0.001033 0.002442 0.139027 0.000498 0.413325 0.892406 0.009363 0.030060 ... 0.000088 0.000281 0.000031 0.000862 0.000904 0.000099 0.002767 0.000011 0.000303 0.008475
3 1.0 0.120882 0.004638 0.005622 0.036546 0.001167 0.224896 0.955808 0.014025 0.046385 ... 0.000197 0.000651 0.000095 0.001853 0.002152 0.000313 0.006129 0.000046 0.000891 0.017457
4 1.0 0.455253 0.018602 0.023497 0.186017 0.006462 0.146856 0.538471 0.048745 0.158115 ... 0.002376 0.007707 0.001360 0.031497 0.025000 0.004412 0.102168 0.000779 0.018030 0.417530

5 rows × 78 columns

.. code:: ipython3 results.summary2(xname=pft.columns) .. raw:: html
Model: OLS Adj. R-squared: 0.302
Dependent Variable: quality AIC: 10821.7528
Date: 2018-09-09 15:02 BIC: 11321.5798
No. Observations: 4872 Log-Likelihood: -5333.9
Df Model: 76 F-statistic: 28.70
Df Residuals: 4795 Prob (F-statistic): 0.00
R-squared: 0.313 Scale: 0.53135
Coef. Std.Err. t P>|t| [0.025 0.975]
1 -1062.0749 1873.8933 -0.5668 0.5709 -4735.7657 2611.6159
fixed_acidity -2.0670 24.3092 -0.0850 0.9322 -49.7241 45.5901
volatile_acidity -729.2270 157.0307 -4.6439 0.0000 -1037.0792 -421.3748
citric_acid 8.7231 200.5623 0.0435 0.9653 -384.4709 401.9171
residual_sugar 3.4131 13.3672 0.2553 0.7985 -22.7927 29.6188
chlorides -1171.6645 689.5425 -1.6992 0.0893 -2523.4842 180.1551
free_sulfur_dioxide 40.1316 8.1636 4.9159 0.0000 24.1272 56.1360
total_sulfur_dioxide 70.8825 22.2599 3.1843 0.0015 27.2429 114.5221
density -724.4624 724.6583 -0.9997 0.3175 -2145.1251 696.2003
pH -251.7727 192.4538 -1.3082 0.1909 -629.0704 125.5250
sulphates -276.1104 163.5189 -1.6886 0.0914 -596.6824 44.4616
alcohol 258.8220 24.9389 10.3782 0.0000 209.9303 307.7138
fixed_acidity^2 1021.3730 1866.5461 0.5472 0.5843 -2637.9138 4680.6598
fixed_acidity * volatile_acidity 394.9815 155.7356 2.5362 0.0112 89.6682 700.2947
fixed_acidity * citric_acid 250.5746 208.6039 1.2012 0.2297 -158.3848 659.5340
fixed_acidity * residual_sugar -4.4734 21.4718 -0.2083 0.8350 -46.5681 37.6213
fixed_acidity * chlorides -829.1409 537.0924 -1.5438 0.1227 -1882.0886 223.8067
fixed_acidity * free_sulfur_dioxide -7.9049 9.5287 -0.8296 0.4068 -26.5856 10.7758
fixed_acidity * total_sulfur_dioxide 5.7595 21.3304 0.2700 0.7872 -36.0579 47.5770
fixed_acidity * density 375.0969 1075.4182 0.3488 0.7273 -1733.2162 2483.4100
fixed_acidity * pH 114.3253 261.5978 0.4370 0.6621 -398.5264 627.1770
fixed_acidity * sulphates -134.9056 151.9652 -0.8877 0.3747 -432.8272 163.0159
fixed_acidity * alcohol -72.6985 26.4607 -2.7474 0.0060 -124.5735 -20.8234
volatile_acidity^2 520.0041 1956.5000 0.2658 0.7904 -3315.6336 4355.6418
volatile_acidity * citric_acid -325.9049 1598.8675 -0.2038 0.8385 -3460.4188 2808.6090
volatile_acidity * residual_sugar -231.1519 129.0113 -1.7917 0.0732 -484.0732 21.7694
volatile_acidity * chlorides 4345.4355 4193.6897 1.0362 0.3002 -3876.1206 12566.9917
volatile_acidity * free_sulfur_dioxide 169.8998 63.4267 2.6787 0.0074 45.5543 294.2453
volatile_acidity * total_sulfur_dioxide 542.4856 138.8860 3.9060 0.0001 270.2053 814.7658
volatile_acidity * density -8778.6169 5436.2565 -1.6148 0.1064 -19436.1739 1878.9402
volatile_acidity * pH 1809.6247 1561.8380 1.1587 0.2467 -1252.2945 4871.5438
volatile_acidity * sulphates -2415.5578 1236.4205 -1.9537 0.0508 -4839.5094 8.3938
volatile_acidity * alcohol 749.0360 191.8871 3.9035 0.0001 372.8492 1125.2228
citric_acid^2 -566.9825 2058.9378 -0.2754 0.7830 -4603.4454 3469.4803
citric_acid * residual_sugar -107.1104 167.6736 -0.6388 0.5230 -435.8275 221.6068
citric_acid * chlorides 3906.7743 5806.8281 0.6728 0.5011 -7477.2732 15290.8218
citric_acid * free_sulfur_dioxide 10.0617 81.2204 0.1239 0.9014 -149.1677 169.2910
citric_acid * total_sulfur_dioxide -24.5923 177.8119 -0.1383 0.8900 -373.1851 324.0006
citric_acid * density 743.3110 8266.7074 0.0899 0.9284 -15463.2288 16949.8507
citric_acid * pH -2927.3589 2162.4656 -1.3537 0.1759 -7166.7837 1312.0660
citric_acid * sulphates 1943.9211 1604.4419 1.2116 0.2257 -1201.5212 5089.3633
citric_acid * alcohol 604.3677 250.6121 2.4116 0.0159 113.0530 1095.6823
residual_sugar^2 1020.5273 1873.8778 0.5446 0.5860 -2653.1330 4694.1876
residual_sugar * chlorides 1250.7078 760.6158 1.6443 0.1002 -240.4481 2741.8637
residual_sugar * free_sulfur_dioxide -10.1449 4.8977 -2.0714 0.0384 -19.7466 -0.5432
residual_sugar * total_sulfur_dioxide 2.2443 12.2455 0.1833 0.8546 -21.7624 26.2511
residual_sugar * density 616.4495 725.3393 0.8499 0.3954 -805.5484 2038.4474
residual_sugar * pH -134.1735 203.3448 -0.6598 0.5094 -532.8226 264.4757
residual_sugar * sulphates -123.4359 167.3097 -0.7378 0.4607 -451.4397 204.5678
residual_sugar * alcohol -4.4518 20.8884 -0.2131 0.8312 -45.4026 36.4990
chlorides^2 4274.1441 6764.6180 0.6318 0.5275 -8987.6111 17535.8993
chlorides * free_sulfur_dioxide -361.0029 283.6415 -1.2727 0.2032 -917.0705 195.0647
chlorides * total_sulfur_dioxide 1132.4796 607.3146 1.8647 0.0623 -58.1356 2323.0948
chlorides * density 32386.1105 22852.8882 1.4172 0.1565 -12416.0364 77188.2575
chlorides * pH -9169.2287 6043.2328 -1.5173 0.1293 -21016.7378 2678.2804
chlorides * sulphates -3935.3569 4533.8479 -0.8680 0.3854 -12823.7790 4953.0653
chlorides * alcohol 1044.0159 730.5207 1.4291 0.1530 -388.1398 2476.1716
free_sulfur_dioxide^2 1011.9144 1874.0030 0.5400 0.5892 -2661.9913 4685.8201
free_sulfur_dioxide * total_sulfur_dioxide -31.4356 7.2561 -4.3323 0.0000 -45.6610 -17.2103
free_sulfur_dioxide * density 527.6354 299.1980 1.7635 0.0779 -58.9301 1114.2008
free_sulfur_dioxide * pH -35.3657 79.2158 -0.4464 0.6553 -190.6650 119.9336
free_sulfur_dioxide * sulphates 77.6789 65.5638 1.1848 0.2362 -50.8562 206.2139
free_sulfur_dioxide * alcohol -57.8306 9.2836 -6.2293 0.0000 -76.0308 -39.6304
total_sulfur_dioxide^2 995.5783 1874.0603 0.5312 0.5953 -2678.4398 4669.5964
total_sulfur_dioxide * density 28.6082 645.1442 0.0443 0.9646 -1236.1706 1293.3869
total_sulfur_dioxide * pH 296.2682 172.0530 1.7220 0.0851 -41.0347 633.5711
total_sulfur_dioxide * sulphates 296.9037 146.6463 2.0246 0.0430 9.4096 584.3978
total_sulfur_dioxide * alcohol -196.1275 22.5097 -8.7130 0.0000 -240.2568 -151.9981
density^2 -6153.6885 18362.8686 -0.3351 0.7376 -42153.3367 29845.9598
density * pH 14642.0880 9841.6551 1.4878 0.1369 -4652.0717 33936.2477
density * sulphates -1516.3079 5845.7221 -0.2594 0.7953 -12976.6055 9943.9897
density * alcohol -2197.1628 981.3622 -2.2389 0.0252 -4121.0830 -273.2427
pH^2 -2425.8554 1193.3038 -2.0329 0.0421 -4765.2784 -86.4324
pH * sulphates 1641.2450 1566.4224 1.0478 0.2948 -1429.6615 4712.1516
pH * alcohol 725.5301 253.2564 2.8648 0.0042 229.0313 1222.0290
sulphates^2 -1657.2401 2016.9159 -0.8217 0.4113 -5611.3208 2296.8406
sulphates * alcohol 398.5591 200.1289 1.9915 0.0465 6.2146 790.9036
alcohol^2 898.1505 1871.4360 0.4799 0.6313 -2770.7228 4567.0237
Omnibus: 85.458 Durbin-Watson: 1.991
Prob(Omnibus): 0.000 Jarque-Bera (JB): 148.619
Skew: 0.132 Prob(JB): 0.000
Kurtosis: 3.814 Condition No.: 4493323340279420
On ne garde que celles dont la `p-value `__ est inférieur à 0.05. .. code:: ipython3 pval = results.pvalues.copy() pval[pval <= 0.05] .. parsed-literal:: x2 3.511159e-06 x6 9.131393e-07 x7 1.460269e-03 x11 5.715290e-25 x13 1.123675e-02 x22 6.029122e-03 x27 7.416579e-03 x28 9.513630e-05 x32 9.610320e-05 x41 1.592150e-02 x44 3.837865e-02 x58 1.505845e-05 x62 5.085920e-10 x66 4.296131e-02 x67 4.014502e-18 x71 2.520870e-02 x72 4.211861e-02 x74 4.190811e-03 x76 4.648129e-02 dtype: float64 .. code:: ipython3 pval.index = pft.columns pval[pval <= 0.05] .. parsed-literal:: volatile_acidity 3.511159e-06 free_sulfur_dioxide 9.131393e-07 total_sulfur_dioxide 1.460269e-03 alcohol 5.715290e-25 fixed_acidity * volatile_acidity 1.123675e-02 fixed_acidity * alcohol 6.029122e-03 volatile_acidity * free_sulfur_dioxide 7.416579e-03 volatile_acidity * total_sulfur_dioxide 9.513630e-05 volatile_acidity * alcohol 9.610320e-05 citric_acid * alcohol 1.592150e-02 residual_sugar * free_sulfur_dioxide 3.837865e-02 free_sulfur_dioxide * total_sulfur_dioxide 1.505845e-05 free_sulfur_dioxide * alcohol 5.085920e-10 total_sulfur_dioxide * sulphates 4.296131e-02 total_sulfur_dioxide * alcohol 4.014502e-18 density * alcohol 2.520870e-02 pH^2 4.211861e-02 pH * alcohol 4.190811e-03 sulphates * alcohol 4.648129e-02 dtype: float64 Le modèle fonctionne mieux mais il est plus compliqué de savoir si la contribution de l’alcool est corrélée positivement avec la qualité car l’alcool apparaît dans plus d’une variable.