Cube multidimensionnel - correction#
Links: notebook
, html, PDF
, python
, slides, GitHub
Manipulation de tables de mortalités façon OLAP, correction des exercices.
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('ggplot')
import pyensae
from pyquickhelper.helpgen import NbImage
from jyquickhelper import add_notebook_menu
add_notebook_menu()
Populating the interactive namespace from numpy and matplotlib
On lit les données puis on recrée un DataSet :
from actuariat_python.data import table_mortalite_euro_stat
table_mortalite_euro_stat()
import pandas
df = pandas.read_csv("mortalite.txt", sep="\t", encoding="utf8", low_memory=False)
df2 = df[["annee", "age_num","indicateur","pays","genre","valeur"]].dropna().reset_index(drop=True)
piv = df2.pivot_table(index=["annee", "age_num","pays","genre"],
columns=["indicateur"],
values="valeur")
import xarray
ds = xarray.Dataset.from_dataframe(piv)
ds
<xarray.Dataset>
Dimensions: (age_num: 84, annee: 54, genre: 3, pays: 54)
Coordinates:
* annee (annee) int64 1960 1961 1962 1963 1964 1965 1966 1967 1968 ...
* age_num (age_num) float64 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 ...
* pays (pays) object 'AM' 'AT' 'AZ' 'BE' 'BG' 'BY' 'CH' 'CY' 'CZ' ...
* genre (genre) object 'F' 'M' 'T'
Data variables:
DEATHRATE (annee, age_num, pays, genre) float64 nan nan nan nan nan ...
LIFEXP (annee, age_num, pays, genre) float64 nan nan nan nan nan ...
PROBDEATH (annee, age_num, pays, genre) float64 nan nan nan nan nan ...
PROBSURV (annee, age_num, pays, genre) float64 nan nan nan nan nan ...
PYLIVED (annee, age_num, pays, genre) float64 nan nan nan nan nan ...
SURVIVORS (annee, age_num, pays, genre) float64 nan nan nan nan nan ...
TOTPYLIVED (annee, age_num, pays, genre) float64 nan nan nan nan nan ...
Exercice 1 : que font les lignes suivantes ?#
Le programme suivant uilise les fonctions align nad reindex pour faire une moyenne sur une des dimensions du DataSet (le pays) puis à ajouter une variable meanp contenant le résultat.
ds.assign(LIFEEXP_add = ds.LIFEXP-1)
<xarray.Dataset>
Dimensions: (age_num: 84, annee: 54, genre: 3, pays: 54)
Coordinates:
* annee (annee) int64 1960 1961 1962 1963 1964 1965 1966 1967 1968 ...
* age_num (age_num) float64 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 ...
* pays (pays) object 'AM' 'AT' 'AZ' 'BE' 'BG' 'BY' 'CH' 'CY' 'CZ' ...
* genre (genre) object 'F' 'M' 'T'
Data variables:
DEATHRATE (annee, age_num, pays, genre) float64 nan nan nan nan nan ...
LIFEXP (annee, age_num, pays, genre) float64 nan nan nan nan nan ...
PROBDEATH (annee, age_num, pays, genre) float64 nan nan nan nan nan ...
PROBSURV (annee, age_num, pays, genre) float64 nan nan nan nan nan ...
PYLIVED (annee, age_num, pays, genre) float64 nan nan nan nan nan ...
SURVIVORS (annee, age_num, pays, genre) float64 nan nan nan nan nan ...
TOTPYLIVED (annee, age_num, pays, genre) float64 nan nan nan nan nan ...
LIFEEXP_add (annee, age_num, pays, genre) float64 nan nan nan nan nan ...
meanp = ds.mean(dim="pays")
ds1, ds2 = xarray.align(ds, meanp, join='outer')
joined = ds1.assign(meanp = ds2["LIFEXP"])
joined.to_dataframe().head()
DEATHRATE | LIFEXP | PROBDEATH | PROBSURV | PYLIVED | SURVIVORS | TOTPYLIVED | meanp | ||||
---|---|---|---|---|---|---|---|---|---|---|---|
age_num | annee | genre | pays | ||||||||
1 | 1960 | F | AM | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 73.52 |
AT | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 73.52 | |||
AZ | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 73.52 | |||
BE | 0.00159 | 73.7 | 0.00159 | 0.99841 | 97316 | 97393 | 7179465 | 73.52 | |||
BG | 0.00652 | 73.2 | 0.00650 | 0.99350 | 95502 | 95813 | 7017023 | 73.52 |
Les valeurs meanp sont constantes quelque soient le pays à annee, age_num, genre fixés.
joined.sel(annee=2000, age_num=59, genre='F')["meanp"]
<xarray.DataArray 'meanp' ()>
array(23.83243243243243)
Coordinates:
annee int64 2000
genre object 'F'
age_num float64 59.0