{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# Donn\u00e9es, approches fonctionnelles - correction\n", "\n", "Correction de l'approche fonctionnelle. Elle s'appuie principalement sur des it\u00e9rateurs et le module [cytoolz](https://pypi.python.org/pypi/cytoolz)."]}, {"cell_type": "code", "execution_count": 1, "metadata": {"collapsed": false}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["Populating the interactive namespace from numpy and matplotlib\n"]}, {"data": {"text/html": ["<div id=\"my_id_menu_nb\">run previous cell, wait for 2 seconds</div>\n", "<script>\n", "function repeat_indent_string(n){\n", "    var a = \"\" ;\n", "    for ( ; n > 0 ; --n) {\n", "        a += \"    \";\n", "    }\n", "    return a;\n", "}\n", "var update_menu_string = function(begin, lfirst, llast, sformat, send, keep_item) {\n", "    var anchors = document.getElementsByClassName(\"section\");\n", "    if (anchors.length == 0) {\n", "        anchors = document.getElementsByClassName(\"text_cell_render rendered_html\");\n", "    }\n", "    var i,t;\n", "    var text_menu = begin;\n", "    var text_memo = \"<pre>\\nlength:\" + anchors.length + \"\\n\";\n", "    var ind = \"\";\n", "    var memo_level = 1;\n", "    var href;\n", "    var tags = [];\n", "    var main_item = 0;\n", "    for (i = 0; i <= llast; i++) {\n", "        tags.push(\"h\" + i);\n", "    }\n", "\n", "    for (i = 0; i < anchors.length; i++) {\n", "        text_memo += \"**\" + anchors[i].id + \"--\\n\";\n", "\n", "        var child = null;\n", "        for(t = 0; t < tags.length; t++) {\n", "            var r = anchors[i].getElementsByTagName(tags[t]);\n", "            if (r.length > 0) {\n", "child = r[0];\n", "break;\n", "            }\n", "        }\n", "        if (child == null){\n", "            text_memo += \"null\\n\";\n", "            continue;\n", "        }\n", "        if (anchors[i].hasAttribute(\"id\")) {\n", "            // when converted in RST\n", "            href = anchors[i].id;\n", "            text_memo += \"#1-\" + href;\n", "            // passer \u00e0 child suivant (le chercher)\n", "        }\n", "        else if (child.hasAttribute(\"id\")) {\n", "            // in a notebook\n", "            href = child.id;\n", "            text_memo += \"#2-\" + href;\n", "        }\n", "        else {\n", "            text_memo += \"#3-\" + \"*\" + \"\\n\";\n", "            continue;\n", "        }\n", "        var title = child.textContent;\n", "        var level = parseInt(child.tagName.substring(1,2));\n", "\n", "        text_memo += \"--\" + level + \"?\" + lfirst + \"--\" + title + \"\\n\";\n", "\n", "        if ((level < lfirst) || (level > llast)) {\n", "            continue ;\n", "        }\n", "        if (title.endsWith('\u00b6')) {\n", "            title = title.substring(0,title.length-1).replace(\"<\", \"&lt;\").replace(\">\", \"&gt;\").replace(\"&\", \"&amp;\")\n", "        }\n", "\n", "        if (title.length == 0) {\n", "            continue;\n", "        }\n", "\n", "        while (level < memo_level) {\n", "            text_menu += \"</ul>\\n\";\n", "            memo_level -= 1;\n", "        }\n", "        if (level == lfirst) {\n", "            main_item += 1;\n", "        }\n", "        if (keep_item != -1 && main_item != keep_item + 1) {\n", "            // alert(main_item + \" - \" + level + \" - \" + keep_item);\n", "            continue;\n", "        }\n", "        while (level > memo_level) {\n", "            text_menu += \"<ul>\\n\";\n", "            memo_level += 1;\n", "        }\n", "        text_menu += repeat_indent_string(level-2) + sformat.replace(\"__HREF__\", href).replace(\"__TITLE__\", title);\n", "    }\n", "    while (1 < memo_level) {\n", "        text_menu += \"</ul>\\n\";\n", "        memo_level -= 1;\n", "    }\n", "    text_menu += send;\n", "    //text_menu += \"\\n\" + text_memo;\n", "    return text_menu;\n", "};\n", "var update_menu = function() {\n", "    var sbegin = \"\";\n", "    var sformat = '<li><a href=\"#__HREF__\">__TITLE__</a></li>';\n", "    var send = \"\";\n", "    var keep_item = -1;\n", "    var text_menu = update_menu_string(sbegin, 2, 4, sformat, send, keep_item);\n", "    var menu = document.getElementById(\"my_id_menu_nb\");\n", "    menu.innerHTML=text_menu;\n", "};\n", "window.setTimeout(update_menu,2000);\n", "            </script>"], "text/plain": ["<IPython.core.display.HTML object>"]}, "execution_count": 2, "metadata": {}, "output_type": "execute_result"}], "source": ["%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "plt.style.use('ggplot')\n", "import pyensae\n", "from jyquickhelper import add_notebook_menu\n", "add_notebook_menu()"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Le notebook utilisera des donn\u00e9es issues d'une table de mortalit\u00e9 extraite de [table de mortalit\u00e9 de 1960 \u00e0 2010](http://www.data-publica.com/opendata/7098--population-et-conditions-sociales-table-de-mortalite-de-1960-a-2010) qu'on r\u00e9cup\u00e8re \u00e0 l'aide de la fonction [table_mortalite_euro_stat](http://www.xavierdupre.fr/app/actuariat_python/helpsphinx/actuariat_python/data/population.html#actuariat_python.data.population.table_mortalite_euro_stat)."]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Exercice 1 : application aux grandes bases de donn\u00e9es\n", "\n", "Imaginons qu'on a une base de donn\u00e9es de 10 milliards de lignes. On doit lui appliquer deux traitements : ``f1``, ``f2``. On a deux options possibles :\n", "\n", "* Appliquer la fonction ``f1`` sur tous les \u00e9l\u00e9ments, puis appliquer ``f2`` sur tous les \u00e9l\u00e9ments transform\u00e9s par ``f1``.\n", "* Application la combinaison des g\u00e9n\u00e9rateurs ``f1``, ``f2`` sur chaque ligne de la base de donn\u00e9es.\n", "\n", "Que se passe-t-il si on a fait une erreur d'impl\u00e9mentation dans la fonction ``f2`` ?"]}, {"cell_type": "code", "execution_count": 2, "metadata": {"collapsed": true}, "outputs": [], "source": []}, {"cell_type": "markdown", "metadata": {"collapsed": true}, "source": ["## Exercice 2 : cytoolz\n", "\n", "La note d'un candidat \u00e0 un concours de patinage artistique fait la moyenne de trois moyennes parmi cinq, les deux extr\u00eames n'\u00e9tant pas prises en compte. Il faut calculer cette somme pour un ensemble de candidats avec [cytoolz](https://pypi.python.org/pypi/cytoolz)."]}, {"cell_type": "code", "execution_count": 3, "metadata": {"collapsed": false}, "outputs": [{"data": {"text/html": ["<div>\n", "<table border=\"1\" class=\"dataframe\">\n", "  <thead>\n", "    <tr style=\"text-align: right;\">\n", "      <th></th>\n", "      <th>juge</th>\n", "      <th>nom</th>\n", "      <th>note</th>\n", "    </tr>\n", "  </thead>\n", "  <tbody>\n", "    <tr>\n", "      <th>0</th>\n", "      <td>1</td>\n", "      <td>A</td>\n", "      <td>8</td>\n", "    </tr>\n", "    <tr>\n", "      <th>1</th>\n", "      <td>2</td>\n", "      <td>A</td>\n", "      <td>9</td>\n", "    </tr>\n", "    <tr>\n", "      <th>2</th>\n", "      <td>3</td>\n", "      <td>A</td>\n", "      <td>7</td>\n", "    </tr>\n", "    <tr>\n", "      <th>3</th>\n", "      <td>4</td>\n", "      <td>A</td>\n", "      <td>4</td>\n", "    </tr>\n", "    <tr>\n", "      <th>4</th>\n", "      <td>5</td>\n", "      <td>A</td>\n", "      <td>5</td>\n", "    </tr>\n", "    <tr>\n", "      <th>5</th>\n", "      <td>1</td>\n", "      <td>B</td>\n", "      <td>7</td>\n", "    </tr>\n", "    <tr>\n", "      <th>6</th>\n", "      <td>2</td>\n", "      <td>B</td>\n", "      <td>4</td>\n", "    </tr>\n", "    <tr>\n", "      <th>7</th>\n", "      <td>3</td>\n", "      <td>B</td>\n", "      <td>7</td>\n", "    </tr>\n", "    <tr>\n", "      <th>8</th>\n", "      <td>4</td>\n", "      <td>B</td>\n", "      <td>9</td>\n", "    </tr>\n", "    <tr>\n", "      <th>9</th>\n", "      <td>1</td>\n", "      <td>B</td>\n", "      <td>10</td>\n", "    </tr>\n", "    <tr>\n", "      <th>10</th>\n", "      <td>2</td>\n", "      <td>C</td>\n", "      <td>0</td>\n", "    </tr>\n", "    <tr>\n", "      <th>11</th>\n", "      <td>3</td>\n", "      <td>C</td>\n", "      <td>10</td>\n", "    </tr>\n", "    <tr>\n", "      <th>12</th>\n", "      <td>4</td>\n", "      <td>C</td>\n", "      <td>8</td>\n", "    </tr>\n", "    <tr>\n", "      <th>13</th>\n", "      <td>5</td>\n", "      <td>C</td>\n", "      <td>8</td>\n", "    </tr>\n", "    <tr>\n", "      <th>14</th>\n", "      <td>5</td>\n", "      <td>C</td>\n", "      <td>8</td>\n", "    </tr>\n", "  </tbody>\n", "</table>\n", "</div>"], "text/plain": ["    juge nom  note\n", "0      1   A     8\n", "1      2   A     9\n", "2      3   A     7\n", "3      4   A     4\n", "4      5   A     5\n", "5      1   B     7\n", "6      2   B     4\n", "7      3   B     7\n", "8      4   B     9\n", "9      1   B    10\n", "10     2   C     0\n", "11     3   C    10\n", "12     4   C     8\n", "13     5   C     8\n", "14     5   C     8"]}, "execution_count": 4, "metadata": {}, "output_type": "execute_result"}], "source": ["notes = [dict(nom=\"A\", juge=1, note=8),\n", "        dict(nom=\"A\", juge=2, note=9),\n", "        dict(nom=\"A\", juge=3, note=7),\n", "        dict(nom=\"A\", juge=4, note=4),\n", "        dict(nom=\"A\", juge=5, note=5),\n", "        dict(nom=\"B\", juge=1, note=7),\n", "        dict(nom=\"B\", juge=2, note=4),\n", "        dict(nom=\"B\", juge=3, note=7),\n", "        dict(nom=\"B\", juge=4, note=9),\n", "        dict(nom=\"B\", juge=1, note=10),\n", "        dict(nom=\"C\", juge=2, note=0),\n", "        dict(nom=\"C\", juge=3, note=10),\n", "        dict(nom=\"C\", juge=4, note=8),\n", "        dict(nom=\"C\", juge=5, note=8),        \n", "        dict(nom=\"C\", juge=5, note=8),        \n", "        ]\n", "\n", "import pandas\n", "pandas.DataFrame(notes)"]}, {"cell_type": "code", "execution_count": 4, "metadata": {"collapsed": true}, "outputs": [], "source": ["import cytoolz.itertoolz as itz\n", "import cytoolz.dicttoolz as dtz\n", "from functools import reduce\n", "from operator import add"]}, {"cell_type": "code", "execution_count": 5, "metadata": {"collapsed": false}, "outputs": [{"data": {"text/plain": ["{'A': 6.666666666666667, 'B': 7.666666666666667, 'C': 8.0}"]}, "execution_count": 6, "metadata": {}, "output_type": "execute_result"}], "source": ["gr = itz.groupby(lambda d: d[\"nom\"], notes)\n", "\n", "def select_note(key_value):\n", "    key, value = key_value\n", "    return key, map(lambda d: d[\"note\"], value)\n", "\n", "gr_notes = dtz.itemmap(select_note,  gr)\n", "\n", "def enleve_extreme(key_value):\n", "    key, value = key_value\n", "    return key, itz.take(3, itz.drop(1,sorted(value)))\n", "\n", "def moyenne(key_value):\n", "    key, value = key_value\n", "    return key, reduce(add, value)/3\n", "\n", "no_ext = dtz.itemmap( enleve_extreme, gr_notes)\n", "\n", "moy = dtz.itemmap( moyenne, no_ext)\n", "moy"]}, {"cell_type": "code", "execution_count": 6, "metadata": {"collapsed": true}, "outputs": [], "source": []}], "metadata": {"kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2"}}, "nbformat": 4, "nbformat_minor": 2}