{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# Donn\u00e9es, approches fonctionnelles - correction\n", "\n", "Correction de l'approche fonctionnelle. Elle s'appuie principalement sur des it\u00e9rateurs et le module [cytoolz](https://pypi.python.org/pypi/cytoolz)."]}, {"cell_type": "code", "execution_count": 1, "metadata": {"collapsed": false}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["Populating the interactive namespace from numpy and matplotlib\n"]}, {"data": {"text/html": ["
run previous cell, wait for 2 seconds
\n", ""], "text/plain": [""]}, "execution_count": 2, "metadata": {}, "output_type": "execute_result"}], "source": ["%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "plt.style.use('ggplot')\n", "import pyensae\n", "from jyquickhelper import add_notebook_menu\n", "add_notebook_menu()"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Le notebook utilisera des donn\u00e9es issues d'une table de mortalit\u00e9 extraite de [table de mortalit\u00e9 de 1960 \u00e0 2010](http://www.data-publica.com/opendata/7098--population-et-conditions-sociales-table-de-mortalite-de-1960-a-2010) qu'on r\u00e9cup\u00e8re \u00e0 l'aide de la fonction [table_mortalite_euro_stat](http://www.xavierdupre.fr/app/actuariat_python/helpsphinx/actuariat_python/data/population.html#actuariat_python.data.population.table_mortalite_euro_stat)."]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Exercice 1 : application aux grandes bases de donn\u00e9es\n", "\n", "Imaginons qu'on a une base de donn\u00e9es de 10 milliards de lignes. On doit lui appliquer deux traitements : ``f1``, ``f2``. On a deux options possibles :\n", "\n", "* Appliquer la fonction ``f1`` sur tous les \u00e9l\u00e9ments, puis appliquer ``f2`` sur tous les \u00e9l\u00e9ments transform\u00e9s par ``f1``.\n", "* Application la combinaison des g\u00e9n\u00e9rateurs ``f1``, ``f2`` sur chaque ligne de la base de donn\u00e9es.\n", "\n", "Que se passe-t-il si on a fait une erreur d'impl\u00e9mentation dans la fonction ``f2`` ?"]}, {"cell_type": "code", "execution_count": 2, "metadata": {"collapsed": true}, "outputs": [], "source": []}, {"cell_type": "markdown", "metadata": {"collapsed": true}, "source": ["## Exercice 2 : cytoolz\n", "\n", "La note d'un candidat \u00e0 un concours de patinage artistique fait la moyenne de trois moyennes parmi cinq, les deux extr\u00eames n'\u00e9tant pas prises en compte. Il faut calculer cette somme pour un ensemble de candidats avec [cytoolz](https://pypi.python.org/pypi/cytoolz)."]}, {"cell_type": "code", "execution_count": 3, "metadata": {"collapsed": false}, "outputs": [{"data": {"text/html": ["
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
jugenomnote
01A8
12A9
23A7
34A4
45A5
51B7
62B4
73B7
84B9
91B10
102C0
113C10
124C8
135C8
145C8
\n", "
"], "text/plain": [" juge nom note\n", "0 1 A 8\n", "1 2 A 9\n", "2 3 A 7\n", "3 4 A 4\n", "4 5 A 5\n", "5 1 B 7\n", "6 2 B 4\n", "7 3 B 7\n", "8 4 B 9\n", "9 1 B 10\n", "10 2 C 0\n", "11 3 C 10\n", "12 4 C 8\n", "13 5 C 8\n", "14 5 C 8"]}, "execution_count": 4, "metadata": {}, "output_type": "execute_result"}], "source": ["notes = [dict(nom=\"A\", juge=1, note=8),\n", " dict(nom=\"A\", juge=2, note=9),\n", " dict(nom=\"A\", juge=3, note=7),\n", " dict(nom=\"A\", juge=4, note=4),\n", " dict(nom=\"A\", juge=5, note=5),\n", " dict(nom=\"B\", juge=1, note=7),\n", " dict(nom=\"B\", juge=2, note=4),\n", " dict(nom=\"B\", juge=3, note=7),\n", " dict(nom=\"B\", juge=4, note=9),\n", " dict(nom=\"B\", juge=1, note=10),\n", " dict(nom=\"C\", juge=2, note=0),\n", " dict(nom=\"C\", juge=3, note=10),\n", " dict(nom=\"C\", juge=4, note=8),\n", " dict(nom=\"C\", juge=5, note=8), \n", " dict(nom=\"C\", juge=5, note=8), \n", " ]\n", "\n", "import pandas\n", "pandas.DataFrame(notes)"]}, {"cell_type": "code", "execution_count": 4, "metadata": {"collapsed": true}, "outputs": [], "source": ["import cytoolz.itertoolz as itz\n", "import cytoolz.dicttoolz as dtz\n", "from functools import reduce\n", "from operator import add"]}, {"cell_type": "code", "execution_count": 5, "metadata": {"collapsed": false}, "outputs": [{"data": {"text/plain": ["{'A': 6.666666666666667, 'B': 7.666666666666667, 'C': 8.0}"]}, "execution_count": 6, "metadata": {}, "output_type": "execute_result"}], "source": ["gr = itz.groupby(lambda d: d[\"nom\"], notes)\n", "\n", "def select_note(key_value):\n", " key, value = key_value\n", " return key, map(lambda d: d[\"note\"], value)\n", "\n", "gr_notes = dtz.itemmap(select_note, gr)\n", "\n", "def enleve_extreme(key_value):\n", " key, value = key_value\n", " return key, itz.take(3, itz.drop(1,sorted(value)))\n", "\n", "def moyenne(key_value):\n", " key, value = key_value\n", " return key, reduce(add, value)/3\n", "\n", "no_ext = dtz.itemmap( enleve_extreme, gr_notes)\n", "\n", "moy = dtz.itemmap( moyenne, no_ext)\n", "moy"]}, {"cell_type": "code", "execution_count": 6, "metadata": {"collapsed": true}, "outputs": [], "source": []}], "metadata": {"kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2"}}, "nbformat": 4, "nbformat_minor": 2}