.. _td2atimeseriescorrectionrst: ======================================= 2A.ml - Séries temporelles - correction ======================================= .. only:: html **Links:** :download:`notebook `, :downloadlink:`html `, :download:`python `, :downloadlink:`slides `, :githublink:`GitHub|_doc/notebooks/td2a_ml/td2a_timeseries_correction.ipynb|*` Prédictions sur des séries temporelles. .. code:: ipython3 from jyquickhelper import add_notebook_menu add_notebook_menu() .. contents:: :local: .. code:: ipython3 %matplotlib inline Une série temporelles --------------------- On récupère le nombre de sessions d’un site web. .. code:: ipython3 import pandas data = pandas.read_csv("xavierdupre_sessions.csv", sep="\t") data.set_index("Date", inplace=True) data.head() .. raw:: html
Sessions
Date
28/10/2010 7
29/10/2010 6
30/10/2010 4
31/10/2010 6
01/11/2010 2
.. code:: ipython3 data.plot(figsize=(12,4)); .. image:: td2a_timeseries_correction_5_0.png .. code:: ipython3 data[-365:].plot(figsize=(12,4)); .. image:: td2a_timeseries_correction_6_0.png Trends ------ Fonction `detrend `__. .. code:: ipython3 from statsmodels.tsa.tsatools import detrend notrend = detrend(data['Sessions']) data["notrend"] = notrend data["trend"] = data['Sessions'] - notrend data.tail() .. raw:: html
Sessions notrend trend
Date
30/10/2017 914 367.387637 546.612363
31/10/2017 863 316.119822 546.880178
01/11/2017 717 169.852008 547.147992
02/11/2017 884 336.584193 547.415807
03/11/2017 765 217.316379 547.683621
.. code:: ipython3 data.plot(y=["Sessions", "notrend", "trend"], figsize=(14,4)); .. image:: td2a_timeseries_correction_9_0.png On essaye de calculer une tendance en minimisant : :math:`Y=\alpha + \beta t + \gamma t^2`. .. code:: ipython3 notrend2 = detrend(data['Sessions'], order=2) data["notrend2"] = notrend2 data["trend2"] = data["Sessions"] - data["notrend2"] data.plot(y=["Sessions", "notrend2", "trend2"], figsize=(14,4)); .. image:: td2a_timeseries_correction_11_0.png On passe au log. .. code:: ipython3 import numpy data["logSess"] = data["Sessions"].apply(lambda x: numpy.log(x+1)) lognotrend = detrend(data['logSess']) data["lognotrend"] = lognotrend data["logtrend"] = data["logSess"] - data["lognotrend"] data.plot(y=["logSess", "lognotrend", "logtrend"], figsize=(14,4)); .. image:: td2a_timeseries_correction_13_0.png La série est assez particulière. Elle donne l’impression d’avoir un changement de régime. On extrait la composante saisonnière avec `seasonal_decompose `__. .. code:: ipython3 from statsmodels.tsa.seasonal import seasonal_decompose res = seasonal_decompose(data["Sessions"].values.ravel(), freq=7, two_sided=False) data["season"] = res.seasonal data["trendsea"] = res.trend data.plot(y=["Sessions", "season", "trendsea"], figsize=(14,4)); .. parsed-literal:: :2: FutureWarning: the 'freq'' keyword is deprecated, use 'period' instead. res = seasonal_decompose(data["Sessions"].values.ravel(), freq=7, two_sided=False) .. image:: td2a_timeseries_correction_15_1.png .. code:: ipython3 data[-365:].plot(y=["Sessions", "season", "trendsea"], figsize=(14,4)); .. image:: td2a_timeseries_correction_16_0.png .. code:: ipython3 res = seasonal_decompose(data["Sessions"].values.ravel() + 1, freq=7, two_sided=False, model='multiplicative') data["seasonp"] = res.seasonal data["trendseap"] = res.trend data[-365:].plot(y=["Sessions", "seasonp", "trendseap"], figsize=(14,4)); .. parsed-literal:: :1: FutureWarning: the 'freq'' keyword is deprecated, use 'period' instead. res = seasonal_decompose(data["Sessions"].values.ravel() + 1, freq=7, .. image:: td2a_timeseries_correction_17_1.png Enlever la saisonnalité sans la connaître ----------------------------------------- Avec `fit_seasons `__. .. code:: ipython3 from seasonal import fit_seasons cv_seasons, trend = fit_seasons(data["Sessions"]) print(cv_seasons) # data["cs_seasons"] = cv_seasons data["trendcs"] = trend data[-365:].plot(y=["Sessions", "trendcs", "trendsea"], figsize=(14,4)); .. parsed-literal:: [ 26.66213008 16.33420353 -86.59519495 -73.57497492 33.23110565 52.87820674 30.87516435] .. image:: td2a_timeseries_correction_19_1.png Autocorrélograme ---------------- On s’inspire de l’exemple : `Autoregressive Moving Average (ARMA): Sunspots data `__. .. code:: ipython3 import matplotlib.pyplot as plt from statsmodels.graphics.tsaplots import plot_acf, plot_pacf fig = plt.figure(figsize=(12,8)) ax1 = fig.add_subplot(211) fig = plot_acf(data["Sessions"], lags=40, ax=ax1) ax2 = fig.add_subplot(212) fig = plot_pacf(data["Sessions"], lags=40, ax=ax2); .. parsed-literal:: C:\Python395_x64\lib\site-packages\statsmodels\tsa\base\tsa_model.py:7: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead. from pandas import (to_datetime, Int64Index, DatetimeIndex, Period, C:\Python395_x64\lib\site-packages\statsmodels\tsa\base\tsa_model.py:7: FutureWarning: pandas.Float64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead. from pandas import (to_datetime, Int64Index, DatetimeIndex, Period, .. image:: td2a_timeseries_correction_21_1.png On retrouve bien une période de 7. Changements de régime --------------------- - `Gaussian HMM of stock data `__ - `MixedLM `__ - `RLM `__ - `Local Linear Trend `__ - `MarkovAutoregression `__