Estimation des paramètres d’un modèle SIRD

On part d’un modèle CovidSIRD qu’on utilise pour simuler des données. On regarde s’il est possible de réestimer les paramètres du modèle à partir des observations.

Simulation des données

import warnings
from pprint import pprint
import numpy
from matplotlib.cbook.deprecation import MatplotlibDeprecationWarning
import matplotlib.pyplot as plt
import pandas
from aftercovid.models import EpidemicRegressor, CovidSIRD

model = CovidSIRD()
model

CovidSIRD

Quantities

  • S: personnes non contaminées
  • I: nombre de personnes malades ou contaminantes
  • R: personnes guéries (recovered)
  • D: personnes décédées

Constants

  • N: population

Parameters

  • beta: taux de transmission dans la population
  • mu: 1/. : durée moyenne jusque la guérison
  • nu: 1/. : durée moyenne jusqu'au décès

Equations

  • \begin{equation}\mathtt{\text{\textbackslashfrac\{dD\}\{dt\} = I \textbackslashnu}}\end{equation}
  • \begin{equation}\mathtt{\text{\textbackslashfrac\{dI\}\{dt\} = - I \textbackslashmu - I \textbackslashnu + \textbackslashfrac\{I S \textbackslashbeta\}\{N\}}}\end{equation}
  • \begin{equation}\mathtt{\text{\textbackslashfrac\{dR\}\{dt\} = I \textbackslashmu}}\end{equation}
  • \begin{equation}\mathtt{\text{\textbackslashfrac\{dS\}\{dt\} = - \textbackslashfrac\{I S \textbackslashbeta\}\{N\}}}\end{equation}


Mise à jour des coefficients.

model['beta'] = 0.4
model["mu"] = 0.06
model["nu"] = 0.04
pprint(model.P)

Out:

[('beta', 0.4, 'taux de transmission dans la population'),
 ('mu', 0.06, '1/. : durée moyenne jusque la guérison'),
 ('nu', 0.04, "1/. : durée moyenne jusqu'au décès")]

Point de départ

pprint(model.Q)

Out:

[('S', 9990.0, 'personnes non contaminées'),
 ('I', 10.0, 'nombre de personnes malades ou contaminantes'),
 ('R', 0.0, 'personnes guéries (recovered)'),
 ('D', 0.0, 'personnes décédées')]

Simulation

X, y = model.iterate2array(50, derivatives=True)
data = {_[0]: x for _, x in zip(model.Q, X.T)}
data.update({('d' + _[0]): c for _, c in zip(model.Q, y.T)})
df = pandas.DataFrame(data)
df.tail()
S I R D dS dI dR dD
45 306.649109 1543.747681 4889.761719 3259.841309 -18.935555 -135.439209 92.624863 61.749905
46 287.713562 1408.308472 4982.386719 3321.591309 -16.207579 -124.623268 84.498505 56.332336
47 271.505981 1283.685181 5066.885254 3377.923584 -13.941129 -114.427391 77.021111 51.347408
48 257.564850 1169.257812 5143.906250 3429.270996 -12.046389 -104.879387 70.155464 46.770313
49 245.518478 1064.378418 5214.062012 3476.041260 -10.452982 -95.984856 63.862705 42.575134


Visualisation

df.plot(title="Simulation SIRD")
Simulation SIRD

Estimation

Le module implémente la class EpidemicRegressor qui réplique l’API de scikit-learn.

m = EpidemicRegressor('SIRD', verbose=True, learning_rate_init=1e-3,
                      max_iter=10, early_th=1)
m.fit(X, y)
pprint(m.model_.P)

Out:

0/10: loss: 2.562e+06 lr=1e-09
1/10: loss: 1.604e+05 lr=1e-09
2/10: loss: 2.394e+04 lr=1e-09
3/10: loss: 4101 lr=1e-09
4/10: loss: 812.9 lr=1e-09
5/10: loss: 40.23 lr=1e-09
6/10: loss: 6.011 lr=1e-09
7/10: loss: 0.3973 lr=1e-09
8/10: loss: 0.0847 lr=1e-09
9/10: loss: 0.007512 lr=1e-09
10/10: loss: 0.0007915 lr=1e-09
[('beta', 0.40000425354609376, 'taux de transmission dans la population'),
 ('mu', 0.06000026319128172, '1/. : durée moyenne jusque la guérison'),
 ('nu', 0.0400002626430142, "1/. : durée moyenne jusqu'au décès")]

La réaction de la population n’est pas constante tout au long de l’épidémie. Il est possible qu’elle change de comportement tout au long de la propagation. On estime alors les coefficients du modèle sur une fenêtre glissante.

def find_best_model(Xt, yt, lrs, th):
    best_est, best_loss = None, None
    for lr in lrs:
        with warnings.catch_warnings():
            warnings.simplefilter("ignore", RuntimeWarning)
            m = EpidemicRegressor(
                'SIRD',
                learning_rate_init=lr,
                max_iter=500,
                early_th=1)
            m.fit(Xt, yt)
            loss = m.score(Xt, yt)
            if numpy.isnan(loss):
                continue
        if best_est is None or best_loss > loss:
            best_est = m
            best_loss = loss
        if best_loss < th:
            return best_est, best_loss
    return best_est, best_loss

On estime les coefficients du modèle tous les 2 jours sur les 10 derniers jours.

coefs = []
for k in range(0, X.shape[0] - 9, 2):
    end = min(k + 10, X.shape[0])
    Xt, yt = X[k:end], y[k:end]
    m, loss = find_best_model(Xt, yt, [1e-2, 1e-3], 10)
    loss = m.score(Xt, yt)
    print("k={} iter={} loss={:1.3f} coef={}".format(
        k, m.iter_, loss, m.model_._val_p))
    obs = dict(k=k, loss=loss, it=m.iter_, R0=m.model_.R0())
    obs.update({k: v for k, v in zip(m.model_.param_names, m.model_._val_p)})
    coefs.append(obs)

Out:

k=0 iter=418 loss=2.034 coef=[0.42271793 0.071501   0.05083966]
k=2 iter=175 loss=0.796 coef=[0.40845348 0.0651935  0.04301605]
k=4 iter=102 loss=0.261 coef=[0.40296819 0.06122947 0.04159599]
k=6 iter=45 loss=0.030 coef=[0.40062636 0.06029638 0.04028099]
k=8 iter=16 loss=0.005 coef=[0.40009123 0.0600712  0.04011424]
k=10 iter=16 loss=0.001 coef=[0.40001804 0.06000728 0.04002646]
k=12 iter=14 loss=0.022 coef=[0.39993638 0.06000958 0.04006462]
k=14 iter=27 loss=0.002 coef=[0.39999407 0.06001191 0.04001186]
k=16 iter=26 loss=0.288 coef=[0.40002917 0.05989696 0.03989742]
k=18 iter=57 loss=0.115 coef=[0.39997722 0.0600486  0.0400486 ]
k=20 iter=34 loss=0.009 coef=[0.3999938  0.06001091 0.04001089]
k=22 iter=18 loss=0.144 coef=[0.39993481 0.05993534 0.0399695 ]
k=24 iter=17 loss=0.005 coef=[0.40000009 0.06000524 0.04000892]
k=26 iter=20 loss=0.002 coef=[0.39999006 0.0599938  0.03999491]
k=28 iter=17 loss=0.004 coef=[0.40001434 0.06000769 0.0400078 ]
k=30 iter=27 loss=0.028 coef=[0.39998564 0.05998024 0.03998024]
k=32 iter=20 loss=0.000 coef=[0.39999857 0.06000426 0.03999484]
k=34 iter=35 loss=0.077 coef=[0.39930054 0.06001511 0.04001511]
k=36 iter=19 loss=0.015 coef=[0.39926837 0.05998244 0.03998216]
k=38 iter=95 loss=0.885 coef=[0.40828105 0.06017772 0.04017772]
k=40 iter=103 loss=2.420 coef=[0.41954538 0.06036285 0.04036284]

Résumé

dfcoef = pandas.DataFrame(coefs).set_index('k')
dfcoef
loss it R0 beta mu nu
k
0 2.033923 418 3.455253 0.422718 0.071501 0.050840
2 0.796498 175 3.774653 0.408453 0.065193 0.043016
4 0.261258 102 3.918953 0.402968 0.061229 0.041596
6 0.030268 45 3.983265 0.400626 0.060296 0.040281
8 0.005480 16 3.993507 0.400091 0.060071 0.040114
10 0.000524 16 3.998831 0.400018 0.060007 0.040026
12 0.021555 14 3.996399 0.399936 0.060010 0.040065
14 0.002255 27 3.998990 0.399994 0.060012 0.040012
16 0.288398 26 4.008534 0.400029 0.059897 0.039897
18 0.114717 57 3.995888 0.399977 0.060049 0.040049
20 0.008524 34 3.999066 0.399994 0.060011 0.040011
22 0.143889 18 4.003158 0.399935 0.059935 0.039969
24 0.004628 17 3.999435 0.400000 0.060005 0.040009
26 0.002407 20 4.000352 0.399990 0.059994 0.039995
28 0.004416 17 3.999524 0.400014 0.060008 0.040008
30 0.028200 27 4.001438 0.399986 0.059980 0.039980
32 0.000467 20 4.000022 0.399999 0.060004 0.039995
34 0.076763 35 3.991799 0.399301 0.060015 0.040015
36 0.014570 19 3.994098 0.399268 0.059982 0.039982
38 0.884513 95 4.068350 0.408281 0.060178 0.040178
40 2.420004 103 4.165227 0.419545 0.060363 0.040363


On visualise.

with warnings.catch_warnings():
    warnings.simplefilter("ignore", MatplotlibDeprecationWarning)
    fig, ax = plt.subplots(2, 3, figsize=(14, 6))
    dfcoef[["mu", "nu"]].plot(ax=ax[0, 0], logy=True)
    dfcoef[["beta"]].plot(ax=ax[0, 1], logy=True)
    dfcoef[["loss"]].plot(ax=ax[1, 0], logy=True)
    dfcoef[["R0"]].plot(ax=ax[0, 2])
    df.plot(ax=ax[1, 1], logy=True)
    fig.suptitle('Estimation de R0 tout au long de la simulation', fontsize=12)
Estimation de R0 tout au long de la simulation

L’estimation des coefficients est plus compliquée au début et à la fin de l’expérience. Il faudrait sans doute changer de stratégie.

Différentes tailles d’estimation

Le paramètre beta a été estimé sur une période de 10 jours. Est-ce que cela change sur une période plus courte ou plus longue ? Sur des données parfaites (sans bruit), cela ne devrait pas changer grand chose.

coefs = []
for delay in [4, 5, 6, 7, 8, 9, 10]:
    print('delay', delay)
    for k in range(0, X.shape[0] - delay, 4):
        end = min(k + delay, X.shape[0])
        Xt, yt = X[k:end], y[k:end]
        m, loss = find_best_model(Xt, yt, [1e-3, 1e-4], 10)
        loss = m.score(Xt, yt)
        if k == 0:
            print(
                "k={} iter={} loss={:1.3f} coef={}".format(
                    k, m.iter_, loss, m.model_._val_p))
        obs = dict(k=k, loss=loss, it=m.iter_, R0=m.model_.R0(), delay=delay)
        obs.update({k: v for k, v in zip(
            m.model_.param_names, m.model_._val_p)})
        coefs.append(obs)

Out:

delay 4
k=0 iter=5 loss=1.172 coef=[3.64123716e-01 3.33537580e-02 1.04043054e-04]
delay 5
k=0 iter=5 loss=4.192 coef=[3.25535965e-01 1.12171821e-04 1.09799976e-04]
delay 6
k=0 iter=5 loss=23.815 coef=[5.08268071e-01 1.49927245e-04 3.69222219e-02]
delay 7
k=0 iter=5 loss=16.114 coef=[4.21950604e-01 1.67470974e-04 1.60198816e-04]
delay 8
k=0 iter=5 loss=32.454 coef=[4.85256361e-01 1.17812941e-04 5.59355020e-02]
delay 9
k=0 iter=4 loss=33.867 coef=[0.47169642 0.07580096 0.14599524]
delay 10
k=0 iter=500 loss=50.164 coef=[0.48847332 0.16351531 0.02427033]

Résumé

k loss it R0 delay beta mu nu
0 0 1.172286 5 10.883074 4 0.364124 0.033354 0.000104
1 4 55.839432 4 3.291105 4 0.525923 0.139688 0.020113
2 8 6.604077 454 4.253556 4 0.411344 0.052922 0.043784
3 12 2.867941 349 3.896005 4 0.404065 0.062535 0.041178
4 16 0.370953 91 3.986196 4 0.400639 0.060282 0.040225
... ... ... ... ... ... ... ... ...
73 20 0.004141 17 4.000568 10 0.399995 0.059985 0.040000
74 24 0.000309 20 4.000139 10 0.400006 0.059999 0.039999
75 28 0.022005 36 4.000767 10 0.400170 0.060012 0.040012
76 32 0.832540 128 3.979853 10 0.397787 0.059975 0.039975
77 36 8.124877 434 4.133239 10 0.416993 0.060444 0.040444

78 rows × 8 columns



Graphes

with warnings.catch_warnings():
    warnings.simplefilter("ignore", MatplotlibDeprecationWarning)
    fig, ax = plt.subplots(2, 3, figsize=(14, 6))
    for delay in sorted(set(dfcoef['delay'])):
        dfcoef.pivot('k', 'delay', 'mu').plot(
            ax=ax[0, 0], logy=True, legend=False).set_title('mu')
        dfcoef.pivot('k', 'delay', 'nu').plot(
            ax=ax[0, 1], logy=True, legend=False).set_title('nu')
        dfcoef.pivot('k', 'delay', 'beta').plot(
            ax=ax[0, 2], logy=True, legend=False).set_title('beta')
        dfcoef.pivot('k', 'delay', 'R0').plot(
            ax=ax[1, 2], logy=True, legend=False).set_title('R0')
        ax[1, 2].plot([dfcoef.index[0], dfcoef.index[-1]], [1, 1], '--',
                      label="R0=1")
        ax[1, 2].set_ylim(0, 5)
        dfcoef.pivot('k', 'delay', 'loss').plot(
            ax=ax[1, 0], logy=True, legend=False).set_title('loss')
    df.plot(ax=ax[1, 1], logy=True)
    fig.suptitle('Estimation de R0 tout au long de la simulation '
                 'avec différentes tailles de fenêtre', fontsize=12)
Estimation de R0 tout au long de la simulation avec différentes tailles de fenêtre, mu, nu, beta, loss, R0

Out:

somewhereaftercovid_39_std/aftercovid/examples/plot_covid_estimation.py:194: UserWarning: Attempted to set non-positive bottom ylim on a log-scaled axis.
Invalid limit will be ignored.
  ax[1, 2].set_ylim(0, 5)
somewhereaftercovid_39_std/aftercovid/examples/plot_covid_estimation.py:194: UserWarning: Attempted to set non-positive bottom ylim on a log-scaled axis.
Invalid limit will be ignored.
  ax[1, 2].set_ylim(0, 5)
somewhereaftercovid_39_std/aftercovid/examples/plot_covid_estimation.py:194: UserWarning: Attempted to set non-positive bottom ylim on a log-scaled axis.
Invalid limit will be ignored.
  ax[1, 2].set_ylim(0, 5)
somewhereaftercovid_39_std/aftercovid/examples/plot_covid_estimation.py:194: UserWarning: Attempted to set non-positive bottom ylim on a log-scaled axis.
Invalid limit will be ignored.
  ax[1, 2].set_ylim(0, 5)
somewhereaftercovid_39_std/aftercovid/examples/plot_covid_estimation.py:194: UserWarning: Attempted to set non-positive bottom ylim on a log-scaled axis.
Invalid limit will be ignored.
  ax[1, 2].set_ylim(0, 5)
somewhereaftercovid_39_std/aftercovid/examples/plot_covid_estimation.py:194: UserWarning: Attempted to set non-positive bottom ylim on a log-scaled axis.
Invalid limit will be ignored.
  ax[1, 2].set_ylim(0, 5)
somewhereaftercovid_39_std/aftercovid/examples/plot_covid_estimation.py:194: UserWarning: Attempted to set non-positive bottom ylim on a log-scaled axis.
Invalid limit will be ignored.
  ax[1, 2].set_ylim(0, 5)

Le graphique manque de légende. Ce sera pour plus tard.

Données bruitées

L’idée est de voir si l’estimation se comporte aussi bien sur des données bruitées.

Xeps = CovidSIRD.add_noise(X, epsilon=1.)
yeps = numpy.vstack([Xeps[1:] - Xeps[:-1], y[-1:]])

On recommence.

coefs = []
for k in range(0, X.shape[0] - 9, 2):
    end = min(k + 10, X.shape[0])
    Xt, yt = Xeps[k:end], yeps[k:end]
    m, loss = find_best_model(Xt, yt, [1e-2, 1e-3, 1e-4], 10)
    loss = m.score(Xt, yt)
    print(
        "k={} iter={} loss={:1.3f} coef={}".format(
            k, m.iter_, loss, m.model_._val_p))
    obs = dict(k=k, loss=loss, it=m.iter_, R0=m.model_.R0())
    obs.update({k: v for k, v in zip(
        m.model_.param_names, m.model_._val_p)})
    coefs.append(obs)

dfcoef = pandas.DataFrame(coefs).set_index('k')
dfcoef

Out:

k=0 iter=180 loss=5.169 coef=[0.4361254  0.06800805 0.05212197]
k=2 iter=200 loss=10.344 coef=[0.40012254 0.05045164 0.06008218]
k=4 iter=73 loss=27.886 coef=[0.40542969 0.06465982 0.04227415]
k=6 iter=34 loss=49.505 coef=[0.40314012 0.06033077 0.04179862]
k=8 iter=72 loss=126.517 coef=[0.40844004 0.07032129 0.03323232]
k=10 iter=61 loss=332.381 coef=[0.4012656  0.05665731 0.04369407]
k=12 iter=86 loss=658.490 coef=[0.40090382 0.05976982 0.03886393]
k=14 iter=189 loss=769.851 coef=[0.40126343 0.06395087 0.03399441]
k=16 iter=500 loss=968.587 coef=[0.39661892 0.0681803  0.02932439]
k=18 iter=182 loss=1526.657 coef=[0.39536372 0.067393   0.03046092]
k=20 iter=140 loss=1811.868 coef=[0.39397623 0.06409732 0.03245279]
k=22 iter=248 loss=3349.721 coef=[0.38852558 0.06514088 0.0317401 ]
k=24 iter=500 loss=8206.009 coef=[0.38864055 0.06395613 0.03655347]
k=26 iter=500 loss=14141.409 coef=[0.39158324 0.05967827 0.04515903]
k=28 iter=500 loss=14993.627 coef=[0.38814958 0.06127081 0.03846429]
k=30 iter=500 loss=17379.880 coef=[0.3878659  0.05820326 0.04390373]
k=32 iter=500 loss=16903.245 coef=[0.45654911 0.05713588 0.0551572 ]
k=34 iter=500 loss=15729.130 coef=[0.55865476 0.06078938 0.04947198]
k=36 iter=500 loss=13302.022 coef=[0.51200123 0.06314366 0.04144467]
k=38 iter=183 loss=10094.243 coef=[0.41673263 0.05637771 0.05789614]
k=40 iter=399 loss=5809.703 coef=[0.77490492 0.05653062 0.06782912]
loss it R0 beta mu nu
k
0 5.168514 180 3.630445 0.436125 0.068008 0.052122
2 10.343539 200 3.619910 0.400123 0.050452 0.060082
4 27.885923 73 3.791402 0.405430 0.064660 0.042274
6 49.505285 34 3.947347 0.403140 0.060331 0.041799
8 126.516523 72 3.944238 0.408440 0.070321 0.033232
10 332.380949 61 3.998606 0.401266 0.056657 0.043694
12 658.489932 86 4.064571 0.400904 0.059770 0.038864
14 769.850637 189 4.096812 0.401263 0.063951 0.033994
16 968.586756 500 4.067691 0.396619 0.068180 0.029324
18 1526.657223 182 4.040346 0.395364 0.067393 0.030461
20 1811.868495 140 4.080536 0.393976 0.064097 0.032453
22 3349.720560 248 4.010339 0.388526 0.065141 0.031740
24 8206.009092 500 3.866701 0.388641 0.063956 0.036553
26 14141.409380 500 3.735152 0.391583 0.059678 0.045159
28 14993.626752 500 3.891805 0.388150 0.061271 0.038464
30 17379.879530 500 3.798623 0.387866 0.058203 0.043904
32 16903.244642 500 4.065693 0.456549 0.057136 0.055157
34 15729.130445 500 5.066641 0.558655 0.060789 0.049472
36 13302.022391 500 4.895395 0.512001 0.063144 0.041445
38 10094.243024 183 3.646789 0.416733 0.056378 0.057896
40 5809.702661 399 6.231156 0.774905 0.056531 0.067829


Graphes.

dfeps = pandas.DataFrame({_[0]: x for _, x in zip(model.Q, Xeps.T)})

with warnings.catch_warnings():
    warnings.simplefilter("ignore", MatplotlibDeprecationWarning)
    fig, ax = plt.subplots(2, 3, figsize=(14, 6))
    dfcoef[["mu", "nu"]].plot(ax=ax[0, 0], logy=True)
    dfcoef[["beta"]].plot(ax=ax[0, 1], logy=True)
    dfcoef[["loss"]].plot(ax=ax[1, 0], logy=True)
    dfcoef[["R0"]].plot(ax=ax[0, 2])
    dfeps.plot(ax=ax[1, 1])
    fig.suptitle(
        'Estimation de R0 tout au long de la simulation sur '
        'des données bruitées', fontsize=12)
Estimation de R0 tout au long de la simulation sur des données bruitées

Total running time of the script: ( 3 minutes 25.140 seconds)

Gallery generated by Sphinx-Gallery