2A.ml - Séries temporelles - correction#

Links: notebook, html, python, slides, GitHub

Prédictions sur des séries temporelles.

from jyquickhelper import add_notebook_menu
add_notebook_menu()
%matplotlib inline

Une série temporelles#

On récupère le nombre de sessions d’un site web.

import pandas
data = pandas.read_csv("xavierdupre_sessions.csv", sep="\t")
data.set_index("Date", inplace=True)
data.head()
Sessions
Date
28/10/2010 7
29/10/2010 6
30/10/2010 4
31/10/2010 6
01/11/2010 2
data.plot(figsize=(12,4));
../_images/td2a_timeseries_correction_5_0.png
data[-365:].plot(figsize=(12,4));
../_images/td2a_timeseries_correction_6_0.png

Enlever la saisonnalité sans la connaître#

Avec fit_seasons.

from seasonal import fit_seasons
cv_seasons, trend = fit_seasons(data["Sessions"])
print(cv_seasons)
# data["cs_seasons"] = cv_seasons
data["trendcs"] = trend
data[-365:].plot(y=["Sessions", "trendcs", "trendsea"], figsize=(14,4));
[ 26.66213008  16.33420353 -86.59519495 -73.57497492  33.23110565
  52.87820674  30.87516435]
../_images/td2a_timeseries_correction_19_1.png

Autocorrélograme#

On s’inspire de l’exemple : Autoregressive Moving Average (ARMA): Sunspots data.

import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
fig = plt.figure(figsize=(12,8))
ax1 = fig.add_subplot(211)
fig = plot_acf(data["Sessions"], lags=40, ax=ax1)
ax2 = fig.add_subplot(212)
fig = plot_pacf(data["Sessions"], lags=40, ax=ax2);
C:Python395_x64libsite-packagesstatsmodelstsabasetsa_model.py:7: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  from pandas import (to_datetime, Int64Index, DatetimeIndex, Period,
C:Python395_x64libsite-packagesstatsmodelstsabasetsa_model.py:7: FutureWarning: pandas.Float64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  from pandas import (to_datetime, Int64Index, DatetimeIndex, Period,
../_images/td2a_timeseries_correction_21_1.png

On retrouve bien une période de 7.

Changements de régime#