module datasets.eurostat#

Short summary#

module sparkouille.datasets.eurostat

Datasets from Eurostat.

source on GitHub

Functions#

function

truncated documentation

table_mortalite_euro_stat

This function retrieves mortality table from EuroStat through table de mortalité

Documentation#

Datasets from Eurostat.

source on GitHub

sparkouille.datasets.eurostat.table_mortalite_euro_stat(url='http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?file=data/', name='demo_mlifetable.tsv.gz', final_name='mortalite.txt', whereTo='.', stop_at=None, fLOG=<function noLOG>)#

This function retrieves mortality table from EuroStat through table de mortalité (this link is currently broken, data-publica does not provide such a database anymore, a copy is provided).

Paramètres:
  • url – data source

  • name – data table name

  • final_name – the data is compressed, it needs to be uncompressed into a file, this parameter defines its name

  • whereTo – data needs to be downloaded, location of this place

  • stop_at – the overall process is quite long, if not None, it only keeps the first rows

  • fLOG – logging function

Renvoie:

data_frame

The function checks the file final_name exists. If it is the case, the data is not downloaded twice. The header contains a weird format as coordinates are separated by a comma:

indic_de,sex,age,geo\time    2013     2012     2011     2010     2009

We need to preprocess the data to split this information into columns. The overall process takes 4-5 minutes, 10 seconds to download (< 10 Mb), 4-5 minutes to preprocess the data (it could be improved). The processed data contains the following columns:

['annee', 'valeur', 'age', 'age_num', 'indicateur', 'genre', 'pays']

Columns age and age_num look alike. age_num is numeric and is equal to age except when age_num is 85. Everybody above that age fall into the same category. The table contains many indicators:

  • PROBSURV: Probabilité de survie entre deux âges exacts (px)

  • LIFEXP: Esperance de vie à l’âge exact (ex)

  • SURVIVORS: Nombre des survivants à l’âge exact (lx)

  • PYLIVED: Nombre d’années personnes vécues entre deux âges exacts (Lx)

  • DEATHRATE: Taux de mortalité à l’âge x (Mx)

  • PROBDEATH: Probabilité de décès entre deux âges exacts (qx)

  • TOTPYLIVED: Nombre total d’années personne vécues après l’âge exact (Tx)

source on GitHub