{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# 2A.ml - 2017 - Pr\u00e9paration des donn\u00e9es\n", "\n", "Ce notebook explique comment les donn\u00e9es de la comp\u00e9tation 2017 ont \u00e9t\u00e9 pr\u00e9par\u00e9es. On r\u00e9cup\u00e9re d'abord les donn\u00e9es depuis le site [OpenFoodFacts](https://world.openfoodfacts.org/data)."]}, {"cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [{"data": {"text/html": ["
run previous cell, wait for 2 seconds
\n", ""], "text/plain": [""]}, "execution_count": 2, "metadata": {}, "output_type": "execute_result"}], "source": ["from jyquickhelper import add_notebook_menu\n", "add_notebook_menu()"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## A quoi \u00e7a ressemble"]}, {"cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [{"data": {"text/plain": ["(0.938696850091219, 'Go')"]}, "execution_count": 3, "metadata": {}, "output_type": "execute_result"}], "source": ["import os\n", "os.stat(\"c:/temp/fr.openfoodfacts.org.products.csv\").st_size / 2**30, 'Go'"]}, {"cell_type": "markdown", "metadata": {}, "source": ["C'est gros."]}, {"cell_type": "code", "execution_count": 3, "metadata": {"collapsed": true}, "outputs": [], "source": ["import pyensae"]}, {"cell_type": "code", "execution_count": 4, "metadata": {"collapsed": true}, "outputs": [], "source": ["%load_ext pyensae"]}, {"cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [{"data": {"text/html": ["
\n", "code\turl\tcreator\tcreated_t\tcreated_datetime\tlast_modified_t\tlast_modified_datetime\tproduct_name\tgeneric_name\tquantity\tpackaging\tpackaging_tags\tbrands\tbrands_tags\tcategories\tcategories_tags\tcategories_fr\torigins\torigins_tags\tmanufacturing_places\tmanufacturing_places_tags\tlabels\tlabels_tags\tlabels_fr\temb_codes\temb_codes_tags\tfirst_packaging_code_geo\tcities\tcities_tags\tpurchase_places\tstores\tcountries\tcountries_tags\tcountries_fr\tingredients_text\tallergens\tallergens_fr\ttraces\ttraces_tags\ttraces_fr\tserving_size\tno_nutriments\tadditives_n\tadditives\tadditives_tags\tadditives_fr\tingredients_from_palm_oil_n\tingredients_from_palm_oil\tingredients_from_palm_oil_tags\tingredients_that_may_be_from_palm_oil_n\tingredients_that_may_be_from_palm_oil\tingredients_that_may_be_from_palm_oil_tags\tnutrition_grade_uk\tnutrition_grade_fr\tpnns_groups_1\tpnns_groups_2\tstates\tstates_tags\tstates_fr\tmain_category\tmain_category_fr\timage_url\timage_small_url\tenergy_100g\tenergy-from-fat_100g\tfat_100g\tsaturated-fat_100g\tbutyric-acid_100g\tcaproic-acid_100g\tcaprylic-acid_100g\tcapric-acid_100g\tlauric-acid_100g\tmyristic-acid_100g\tpalmitic-acid_100g\tstearic-acid_100g\tarachidic-acid_100g\tbehenic-acid_100g\tlignoceric-acid_100g\tcerotic-acid_100g\tmontanic-acid_100g\tmelissic-acid_100g\tmonounsaturated-fat_100g\tpolyunsaturated-fat_100g\tomega-3-fat_100g\talpha-linolenic-acid_100g\teicosapentaenoic-acid_100g\tdocosahexaenoic-acid_100g\tomega-6-fat_100g\tlinoleic-acid_100g\tarachidonic-acid_100g\tgamma-linolenic-acid_100g\tdihomo-gamma-linolenic-acid_100g\tomega-9-fat_100g\toleic-acid_100g\telaidic-acid_100g\tgondoic-acid_100g\tmead-acid_100g\terucic-acid_100g\tnervonic-acid_100g\ttrans-fat_100g\tcholesterol_100g\tcarbohydrates_100g\tsugars_100g\tsucrose_100g\tglucose_100g\tfructose_100g\tlactose_100g\tmaltose_100g\tmaltodextrins_100g\tstarch_100g\tpolyols_100g\tfiber_100g\tproteins_100g\tcasein_100g\tserum-proteins_100g\tnucleotides_100g\tsalt_100g\tsodium_100g\talcohol_100g\tvitamin-a_100g\tbeta-carotene_100g\tvitamin-d_100g\tvitamin-e_100g\tvitamin-k_100g\tvitamin-c_100g\tvitamin-b1_100g\tvitamin-b2_100g\tvitamin-pp_100g\tvitamin-b6_100g\tvitamin-b9_100g\tfolates_100g\tvitamin-b12_100g\tbiotin_100g\tpantothenic-acid_100g\tsilica_100g\tbicarbonate_100g\tpotassium_100g\tchloride_100g\tcalcium_100g\tphosphorus_100g\tiron_100g\tmagnesium_100g\tzinc_100g\tcopper_100g\tmanganese_100g\tfluoride_100g\tselenium_100g\tchromium_100g\tmolybdenum_100g\tiodine_100g\tcaffeine_100g\ttaurine_100g\tph_100g\tfruits-vegetables-nuts_100g\tfruits-vegetables-nuts-estimate_100g\tcollagen-meat-protein-ratio_100g\tcocoa_100g\tchlorophyl_100g\tcarbon-footprint_100g\tnutrition-score-fr_100g\tnutrition-score-uk_100g\tglycemic-index_100g\twater-hardness_100g\n", "0000000003087\thttp://world-fr.openfoodfacts.org/produit/0000000003087/farine-de-ble-noir-ferme-t-y-r-nao\topenfoodfacts-contributors\t1474103866\t2016-09-17T09:17:46Z\t1474103893\t2016-09-17T09:18:13Z\tFarine de bl\u00e9 noir\t\t1kg\t\t\tFerme t'y R'nao\tferme-t-y-r-nao\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\ten:FR\ten:france\tFrance\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\ten:to-be-completed, en:nutrition-facts-to-be-completed, en:ingredients-to-be-completed, en:expiration-date-to-be-completed, en:characteristics-to-be-completed, en:categories-to-be-completed, en:brands-completed, en:packaging-to-be-completed, en:quantity-completed, en:product-name-completed, en:photos-to-be-validated, en:photos-uploaded\ten:to-be-completed,en:nutrition-facts-to-be-completed,en:ingredients-to-be-completed,en:expiration-date-to-be-completed,en:characteristics-to-be-completed,en:categories-to-be-completed,en:brands-completed,en:packaging-to-be-completed,en:quantity-completed,en:product-name-completed,en:photos-to-be-validated,en:photos-uploaded\tA compl\u00e9ter,Informations nutritionnelles \u00e0 compl\u00e9ter,Ingr\u00e9dients \u00e0 compl\u00e9ter,Date limite \u00e0 compl\u00e9ter,Caract\u00e9ristiques \u00e0 compl\u00e9ter,Cat\u00e9gories \u00e0 compl\u00e9ter,Marques compl\u00e9t\u00e9es,Emballage \u00e0 compl\u00e9ter,Quantit\u00e9 compl\u00e9t\u00e9e,Nom du produit complete,Photos \u00e0 valider,Photos envoy\u00e9es\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n", "\n", "
"], "text/plain": [""]}, "execution_count": 6, "metadata": {}, "output_type": "execute_result"}], "source": ["%head -n 2 c:/temp/fr.openfoodfacts.org.products.csv"]}, {"cell_type": "code", "execution_count": 6, "metadata": {"collapsed": true}, "outputs": [], "source": ["import pandas"]}, {"cell_type": "code", "execution_count": 7, "metadata": {"collapsed": true}, "outputs": [], "source": ["df = pandas.read_csv(\"c:/temp/fr.openfoodfacts.org.products.csv\", \n", " sep=\"\\t\", encoding=\"utf-8\", nrows=10000, low_memory=False)"]}, {"cell_type": "code", "execution_count": 8, "metadata": {"collapsed": true}, "outputs": [], "source": ["df.head().T.to_excel(\"e.xlsx\")"]}, {"cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [{"data": {"text/html": ["
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
code
url
creator
created_t
created_datetime
last_modified_t
last_modified_datetime
product_name
generic_name
quantity
packaging
packaging_tags
brands
brands_tags
categories
categories_tags
categories_fr
origins
origins_tags
manufacturing_places
manufacturing_places_tags
labels
labels_tags
labels_fr
emb_codes
emb_codes_tags
first_packaging_code_geo
cities
cities_tags
purchase_places
...
pantothenic-acid_100g
silica_100g
bicarbonate_100g
potassium_100g
chloride_100g
calcium_100g
phosphorus_100g
iron_100g
magnesium_100g
zinc_100g
copper_100g
manganese_100g
fluoride_100g
selenium_100g
chromium_100g
molybdenum_100g
iodine_100g
caffeine_100g
taurine_100g
ph_100g
fruits-vegetables-nuts_100g
fruits-vegetables-nuts-estimate_100g
collagen-meat-protein-ratio_100g
cocoa_100g
chlorophyl_100g
carbon-footprint_100g
nutrition-score-fr_100g
nutrition-score-uk_100g
glycemic-index_100g
water-hardness_100g
\n", "

163 rows \u00d7 0 columns

\n", "
"], "text/plain": ["Empty DataFrame\n", "Columns: []\n", "Index: [code, url, creator, created_t, created_datetime, last_modified_t, last_modified_datetime, product_name, generic_name, quantity, packaging, packaging_tags, brands, brands_tags, categories, categories_tags, categories_fr, origins, origins_tags, manufacturing_places, manufacturing_places_tags, labels, labels_tags, labels_fr, emb_codes, emb_codes_tags, first_packaging_code_geo, cities, cities_tags, purchase_places, stores, countries, countries_tags, countries_fr, ingredients_text, allergens, allergens_fr, traces, traces_tags, traces_fr, serving_size, no_nutriments, additives_n, additives, additives_tags, additives_fr, ingredients_from_palm_oil_n, ingredients_from_palm_oil, ingredients_from_palm_oil_tags, ingredients_that_may_be_from_palm_oil_n, ingredients_that_may_be_from_palm_oil, ingredients_that_may_be_from_palm_oil_tags, nutrition_grade_uk, nutrition_grade_fr, pnns_groups_1, pnns_groups_2, states, states_tags, states_fr, main_category, main_category_fr, image_url, image_small_url, energy_100g, energy-from-fat_100g, fat_100g, saturated-fat_100g, butyric-acid_100g, caproic-acid_100g, caprylic-acid_100g, capric-acid_100g, lauric-acid_100g, myristic-acid_100g, palmitic-acid_100g, stearic-acid_100g, arachidic-acid_100g, behenic-acid_100g, lignoceric-acid_100g, cerotic-acid_100g, montanic-acid_100g, melissic-acid_100g, monounsaturated-fat_100g, polyunsaturated-fat_100g, omega-3-fat_100g, alpha-linolenic-acid_100g, eicosapentaenoic-acid_100g, docosahexaenoic-acid_100g, omega-6-fat_100g, linoleic-acid_100g, arachidonic-acid_100g, gamma-linolenic-acid_100g, dihomo-gamma-linolenic-acid_100g, omega-9-fat_100g, oleic-acid_100g, elaidic-acid_100g, gondoic-acid_100g, mead-acid_100g, erucic-acid_100g, nervonic-acid_100g, trans-fat_100g, ...]\n", "\n", "[163 rows x 0 columns]"]}, "execution_count": 10, "metadata": {}, "output_type": "execute_result"}], "source": ["df[df.additives.notnull() & df.additives.str.contains(\"E4\")].head().T"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Id\u00e9e de la comp\u00e9tation\n", "\n", "On veut savoir les additifs ajout\u00e9s apparaissent plus fr\u00e9quemment avec certains produits ou certains compositions. ON cherche donc \u00e0 pr\u00e9dire la pr\u00e9sence d'additifs en fonction de toutes les autres variables. Si un mod\u00e8le de pr\u00e9diction fait mieux que le hasard, cela signifie que certaines corr\u00e9lations existent. J'ai utilis\u00e9 [dask](https://dask.pydata.org/en/latest/) mais si vous de la m\u00e9moire, on peut faire avec [pandas](https://pandas.pydata.org/)."]}, {"cell_type": "code", "execution_count": 10, "metadata": {"collapsed": true}, "outputs": [], "source": ["import dask\n", "import dask.dataframe as dd"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Le code qui suit est construit apr\u00e8s plusieurs essais en fonction des warnings retourn\u00e9s par le module *dask*."]}, {"cell_type": "code", "execution_count": 11, "metadata": {"scrolled": false}, "outputs": [{"data": {"text/html": ["
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
codeurlcreatorcreated_tcreated_datetimelast_modified_tlast_modified_datetimeproduct_namegeneric_namequantity...fruits-vegetables-nuts_100gfruits-vegetables-nuts-estimate_100gcollagen-meat-protein-ratio_100gcocoa_100gchlorophyl_100gcarbon-footprint_100gnutrition-score-fr_100gnutrition-score-uk_100gglycemic-index_100gwater-hardness_100g
00000000003087http://world-fr.openfoodfacts.org/produit/0000...openfoodfacts-contributors14741038662016-09-17T09:17:46Z14741038932016-09-17T09:18:13ZFarine de bl\u00e9 noirNaN1kg...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
10000000004530http://world-fr.openfoodfacts.org/produit/0000...usda-ndb-import14890699572017-03-09T14:32:37Z14890699572017-03-09T14:32:37ZBanana Chips Sweetened (Whole)NaNNaN...NaNNaNNaNNaNNaNNaN14.014.0NaNNaN
20000000004559http://world-fr.openfoodfacts.org/produit/0000...usda-ndb-import14890699572017-03-09T14:32:37Z14890699572017-03-09T14:32:37ZPeanutsNaNNaN...NaNNaNNaNNaNNaNNaN0.00.0NaNNaN
30000000016087http://world-fr.openfoodfacts.org/produit/0000...usda-ndb-import14890557312017-03-09T10:35:31Z14890557312017-03-09T10:35:31ZOrganic Salted Nut MixNaNNaN...NaNNaNNaNNaNNaNNaN12.012.0NaNNaN
40000000016094http://world-fr.openfoodfacts.org/produit/0000...usda-ndb-import14890556532017-03-09T10:34:13Z14890556532017-03-09T10:34:13ZOrganic PolentaNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
\n", "

5 rows \u00d7 163 columns

\n", "
"], "text/plain": [" code url \\\n", "0 0000000003087 http://world-fr.openfoodfacts.org/produit/0000... \n", "1 0000000004530 http://world-fr.openfoodfacts.org/produit/0000... \n", "2 0000000004559 http://world-fr.openfoodfacts.org/produit/0000... \n", "3 0000000016087 http://world-fr.openfoodfacts.org/produit/0000... \n", "4 0000000016094 http://world-fr.openfoodfacts.org/produit/0000... \n", "\n", " creator created_t created_datetime \\\n", "0 openfoodfacts-contributors 1474103866 2016-09-17T09:17:46Z \n", "1 usda-ndb-import 1489069957 2017-03-09T14:32:37Z \n", "2 usda-ndb-import 1489069957 2017-03-09T14:32:37Z \n", "3 usda-ndb-import 1489055731 2017-03-09T10:35:31Z \n", "4 usda-ndb-import 1489055653 2017-03-09T10:34:13Z \n", "\n", " last_modified_t last_modified_datetime product_name \\\n", "0 1474103893 2016-09-17T09:18:13Z Farine de bl\u00e9 noir \n", "1 1489069957 2017-03-09T14:32:37Z Banana Chips Sweetened (Whole) \n", "2 1489069957 2017-03-09T14:32:37Z Peanuts \n", "3 1489055731 2017-03-09T10:35:31Z Organic Salted Nut Mix \n", "4 1489055653 2017-03-09T10:34:13Z Organic Polenta \n", "\n", " generic_name quantity ... fruits-vegetables-nuts_100g \\\n", "0 NaN 1kg ... NaN \n", "1 NaN NaN ... NaN \n", "2 NaN NaN ... NaN \n", "3 NaN NaN ... NaN \n", "4 NaN NaN ... NaN \n", "\n", " fruits-vegetables-nuts-estimate_100g collagen-meat-protein-ratio_100g \\\n", "0 NaN NaN \n", "1 NaN NaN \n", "2 NaN NaN \n", "3 NaN NaN \n", "4 NaN NaN \n", "\n", " cocoa_100g chlorophyl_100g carbon-footprint_100g nutrition-score-fr_100g \\\n", "0 NaN NaN NaN NaN \n", "1 NaN NaN NaN 14.0 \n", "2 NaN NaN NaN 0.0 \n", "3 NaN NaN NaN 12.0 \n", "4 NaN NaN NaN NaN \n", "\n", " nutrition-score-uk_100g glycemic-index_100g water-hardness_100g \n", "0 NaN NaN NaN \n", "1 14.0 NaN NaN \n", "2 0.0 NaN NaN \n", "3 12.0 NaN NaN \n", "4 NaN NaN NaN \n", "\n", "[5 rows x 163 columns]"]}, "execution_count": 12, "metadata": {}, "output_type": "execute_result"}], "source": ["ddf = dd.read_csv(\"c:/temp/fr.openfoodfacts.org.products.csv\", sep=\"\\t\", encoding=\"utf-8\", low_memory=False,\n", " dtype={'allergens': 'object',\n", " 'cities_tags': 'object',\n", " 'emb_codes': 'object',\n", " 'emb_codes_tags': 'object',\n", " 'first_packaging_code_geo': 'object',\n", " 'generic_name': 'object',\n", " 'ingredients_from_palm_oil_tags': 'object',\n", " 'labels': 'object',\n", " 'labels_fr': 'object',\n", " 'labels_tags': 'object',\n", " 'manufacturing_places': 'object',\n", " 'manufacturing_places_tags': 'object',\n", " 'origins': 'object',\n", " 'origins_tags': 'object',\n", " 'stores': 'object',\n", " 'code': 'object','allergens_fr': 'object',\n", " 'cities': 'object',\n", " 'created_t': 'object',\n", " 'last_modified_t': 'object'})\n", "ddf.head()"]}, {"cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["\n"]}], "source": ["print(type(ddf))"]}, {"cell_type": "markdown", "metadata": {}, "source": ["On ajoute la colonne \u00e0 pr\u00e9dire, boole\u00e9nne, qui indique la pr\u00e9sence d'additif commen\u00e7ant par ``'e:'`` comme [E440](http://www.les-additifs-alimentaires.com/E440-pectines.phphttp://www.les-additifs-alimentaires.com/E440-pectines.php)."]}, {"cell_type": "code", "execution_count": 13, "metadata": {"scrolled": false}, "outputs": [{"data": {"text/html": ["
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
codeurlcreatorcreated_tcreated_datetimelast_modified_tlast_modified_datetimeproduct_namegeneric_namequantity...fruits-vegetables-nuts-estimate_100gcollagen-meat-protein-ratio_100gcocoa_100gchlorophyl_100gcarbon-footprint_100gnutrition-score-fr_100gnutrition-score-uk_100gglycemic-index_100gwater-hardness_100ghasE
00000000003087http://world-fr.openfoodfacts.org/produit/0000...openfoodfacts-contributors14741038662016-09-17T09:17:46Z14741038932016-09-17T09:18:13ZFarine de bl\u00e9 noirNaN1kg...NaNNaNNaNNaNNaNNaNNaNNaNNaNFalse
10000000004530http://world-fr.openfoodfacts.org/produit/0000...usda-ndb-import14890699572017-03-09T14:32:37Z14890699572017-03-09T14:32:37ZBanana Chips Sweetened (Whole)NaNNaN...NaNNaNNaNNaNNaN14.014.0NaNNaNFalse
20000000004559http://world-fr.openfoodfacts.org/produit/0000...usda-ndb-import14890699572017-03-09T14:32:37Z14890699572017-03-09T14:32:37ZPeanutsNaNNaN...NaNNaNNaNNaNNaN0.00.0NaNNaNFalse
30000000016087http://world-fr.openfoodfacts.org/produit/0000...usda-ndb-import14890557312017-03-09T10:35:31Z14890557312017-03-09T10:35:31ZOrganic Salted Nut MixNaNNaN...NaNNaNNaNNaNNaN12.012.0NaNNaNFalse
40000000016094http://world-fr.openfoodfacts.org/produit/0000...usda-ndb-import14890556532017-03-09T10:34:13Z14890556532017-03-09T10:34:13ZOrganic PolentaNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNFalse
\n", "

5 rows \u00d7 164 columns

\n", "
"], "text/plain": [" code url \\\n", "0 0000000003087 http://world-fr.openfoodfacts.org/produit/0000... \n", "1 0000000004530 http://world-fr.openfoodfacts.org/produit/0000... \n", "2 0000000004559 http://world-fr.openfoodfacts.org/produit/0000... \n", "3 0000000016087 http://world-fr.openfoodfacts.org/produit/0000... \n", "4 0000000016094 http://world-fr.openfoodfacts.org/produit/0000... \n", "\n", " creator created_t created_datetime \\\n", "0 openfoodfacts-contributors 1474103866 2016-09-17T09:17:46Z \n", "1 usda-ndb-import 1489069957 2017-03-09T14:32:37Z \n", "2 usda-ndb-import 1489069957 2017-03-09T14:32:37Z \n", "3 usda-ndb-import 1489055731 2017-03-09T10:35:31Z \n", "4 usda-ndb-import 1489055653 2017-03-09T10:34:13Z \n", "\n", " last_modified_t last_modified_datetime product_name \\\n", "0 1474103893 2016-09-17T09:18:13Z Farine de bl\u00e9 noir \n", "1 1489069957 2017-03-09T14:32:37Z Banana Chips Sweetened (Whole) \n", "2 1489069957 2017-03-09T14:32:37Z Peanuts \n", "3 1489055731 2017-03-09T10:35:31Z Organic Salted Nut Mix \n", "4 1489055653 2017-03-09T10:34:13Z Organic Polenta \n", "\n", " generic_name quantity ... fruits-vegetables-nuts-estimate_100g \\\n", "0 NaN 1kg ... NaN \n", "1 NaN NaN ... NaN \n", "2 NaN NaN ... NaN \n", "3 NaN NaN ... NaN \n", "4 NaN NaN ... NaN \n", "\n", " collagen-meat-protein-ratio_100g cocoa_100g chlorophyl_100g \\\n", "0 NaN NaN NaN \n", "1 NaN NaN NaN \n", "2 NaN NaN NaN \n", "3 NaN NaN NaN \n", "4 NaN NaN NaN \n", "\n", " carbon-footprint_100g nutrition-score-fr_100g nutrition-score-uk_100g \\\n", "0 NaN NaN NaN \n", "1 NaN 14.0 14.0 \n", "2 NaN 0.0 0.0 \n", "3 NaN 12.0 12.0 \n", "4 NaN NaN NaN \n", "\n", " glycemic-index_100g water-hardness_100g hasE \n", "0 NaN NaN False \n", "1 NaN NaN False \n", "2 NaN NaN False \n", "3 NaN NaN False \n", "4 NaN NaN False \n", "\n", "[5 rows x 164 columns]"]}, "execution_count": 14, "metadata": {}, "output_type": "execute_result"}], "source": ["ddfe = ddf.assign(hasE=ddf.apply(lambda row: isinstance(row.additives, str) and \"en:e\" in row.additives, \n", " axis=1, meta=bool))\n", "ddfe.head()"]}, {"cell_type": "markdown", "metadata": {}, "source": ["On se limite au produit pour lesquels on a quelques informations sur le contenu."]}, {"cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [{"data": {"text/plain": ["['energy_100g',\n", " 'energy-from-fat_100g',\n", " 'fat_100g',\n", " 'saturated-fat_100g',\n", " 'butyric-acid_100g',\n", " 'caproic-acid_100g',\n", " 'caprylic-acid_100g',\n", " 'capric-acid_100g',\n", " 'lauric-acid_100g',\n", " 'myristic-acid_100g',\n", " 'palmitic-acid_100g',\n", " 'stearic-acid_100g',\n", " 'arachidic-acid_100g',\n", " 'behenic-acid_100g',\n", " 'lignoceric-acid_100g',\n", " 'cerotic-acid_100g',\n", " 'montanic-acid_100g',\n", " 'melissic-acid_100g',\n", " 'monounsaturated-fat_100g',\n", " 'polyunsaturated-fat_100g',\n", " 'omega-3-fat_100g',\n", " 'alpha-linolenic-acid_100g',\n", " 'eicosapentaenoic-acid_100g',\n", " 'docosahexaenoic-acid_100g',\n", " 'omega-6-fat_100g',\n", " 'linoleic-acid_100g',\n", " 'arachidonic-acid_100g',\n", " 'gamma-linolenic-acid_100g',\n", " 'dihomo-gamma-linolenic-acid_100g',\n", " 'omega-9-fat_100g',\n", " 'oleic-acid_100g',\n", " 'elaidic-acid_100g',\n", " 'gondoic-acid_100g',\n", " 'mead-acid_100g',\n", " 'erucic-acid_100g',\n", " 'nervonic-acid_100g',\n", " 'trans-fat_100g',\n", " 'cholesterol_100g',\n", " 'carbohydrates_100g',\n", " 'sugars_100g',\n", " 'sucrose_100g',\n", " 'glucose_100g',\n", " 'fructose_100g',\n", " 'lactose_100g',\n", " 'maltose_100g',\n", " 'maltodextrins_100g',\n", " 'starch_100g',\n", " 'polyols_100g',\n", " 'fiber_100g',\n", " 'proteins_100g',\n", " 'casein_100g',\n", " 'serum-proteins_100g',\n", " 'nucleotides_100g',\n", " 'salt_100g',\n", " 'sodium_100g',\n", " 'alcohol_100g',\n", " 'vitamin-a_100g',\n", " 'beta-carotene_100g',\n", " 'vitamin-d_100g',\n", " 'vitamin-e_100g',\n", " 'vitamin-k_100g',\n", " 'vitamin-c_100g',\n", " 'vitamin-b1_100g',\n", " 'vitamin-b2_100g',\n", " 'vitamin-pp_100g',\n", " 'vitamin-b6_100g',\n", " 'vitamin-b9_100g',\n", " 'folates_100g',\n", " 'vitamin-b12_100g',\n", " 'biotin_100g',\n", " 'pantothenic-acid_100g',\n", " 'silica_100g',\n", " 'bicarbonate_100g',\n", " 'potassium_100g',\n", " 'chloride_100g',\n", " 'calcium_100g',\n", " 'phosphorus_100g',\n", " 'iron_100g',\n", " 'magnesium_100g',\n", " 'zinc_100g',\n", " 'copper_100g',\n", " 'manganese_100g',\n", " 'fluoride_100g',\n", " 'selenium_100g',\n", " 'chromium_100g',\n", " 'molybdenum_100g',\n", " 'iodine_100g',\n", " 'caffeine_100g',\n", " 'taurine_100g',\n", " 'ph_100g',\n", " 'fruits-vegetables-nuts_100g',\n", " 'fruits-vegetables-nuts-estimate_100g',\n", " 'collagen-meat-protein-ratio_100g',\n", " 'cocoa_100g',\n", " 'chlorophyl_100g',\n", " 'carbon-footprint_100g',\n", " 'nutrition-score-fr_100g',\n", " 'nutrition-score-uk_100g',\n", " 'glycemic-index_100g',\n", " 'water-hardness_100g']"]}, "execution_count": 15, "metadata": {}, "output_type": "execute_result"}], "source": ["g100 = [_ for _ in ddf.columns if '100g' in _]\n", "g100"]}, {"cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [{"data": {"text/plain": ["(354144, 164)"]}, "execution_count": 16, "metadata": {}, "output_type": "execute_result"}], "source": ["ddfe.compute().shape"]}, {"cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [{"data": {"text/html": ["
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
codeurlcreatorcreated_tcreated_datetimelast_modified_tlast_modified_datetimeproduct_namegeneric_namequantity...collagen-meat-protein-ratio_100gcocoa_100gchlorophyl_100gcarbon-footprint_100gnutrition-score-fr_100gnutrition-score-uk_100gglycemic-index_100gwater-hardness_100ghasEs100
10000000004530http://world-fr.openfoodfacts.org/produit/0000...usda-ndb-import14890699572017-03-09T14:32:37Z14890699572017-03-09T14:32:37ZBanana Chips Sweetened (Whole)NaNNaN...NaNNaNNaNNaN14.014.0NaNNaNFalse17
20000000004559http://world-fr.openfoodfacts.org/produit/0000...usda-ndb-import14890699572017-03-09T14:32:37Z14890699572017-03-09T14:32:37ZPeanutsNaNNaN...NaNNaNNaNNaN0.00.0NaNNaNFalse17
30000000016087http://world-fr.openfoodfacts.org/produit/0000...usda-ndb-import14890557312017-03-09T10:35:31Z14890557312017-03-09T10:35:31ZOrganic Salted Nut MixNaNNaN...NaNNaNNaNNaN12.012.0NaNNaNFalse13
40000000016094http://world-fr.openfoodfacts.org/produit/0000...usda-ndb-import14890556532017-03-09T10:34:13Z14890556532017-03-09T10:34:13ZOrganic PolentaNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNFalse5
50000000016100http://world-fr.openfoodfacts.org/produit/0000...usda-ndb-import14890556512017-03-09T10:34:11Z14890556512017-03-09T10:34:11ZBreadshop Honey Gone Nuts GranolaNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNTrue9
\n", "

5 rows \u00d7 165 columns

\n", "
"], "text/plain": [" code url \\\n", "1 0000000004530 http://world-fr.openfoodfacts.org/produit/0000... \n", "2 0000000004559 http://world-fr.openfoodfacts.org/produit/0000... \n", "3 0000000016087 http://world-fr.openfoodfacts.org/produit/0000... \n", "4 0000000016094 http://world-fr.openfoodfacts.org/produit/0000... \n", "5 0000000016100 http://world-fr.openfoodfacts.org/produit/0000... \n", "\n", " creator created_t created_datetime last_modified_t \\\n", "1 usda-ndb-import 1489069957 2017-03-09T14:32:37Z 1489069957 \n", "2 usda-ndb-import 1489069957 2017-03-09T14:32:37Z 1489069957 \n", "3 usda-ndb-import 1489055731 2017-03-09T10:35:31Z 1489055731 \n", "4 usda-ndb-import 1489055653 2017-03-09T10:34:13Z 1489055653 \n", "5 usda-ndb-import 1489055651 2017-03-09T10:34:11Z 1489055651 \n", "\n", " last_modified_datetime product_name generic_name \\\n", "1 2017-03-09T14:32:37Z Banana Chips Sweetened (Whole) NaN \n", "2 2017-03-09T14:32:37Z Peanuts NaN \n", "3 2017-03-09T10:35:31Z Organic Salted Nut Mix NaN \n", "4 2017-03-09T10:34:13Z Organic Polenta NaN \n", "5 2017-03-09T10:34:11Z Breadshop Honey Gone Nuts Granola NaN \n", "\n", " quantity ... collagen-meat-protein-ratio_100g cocoa_100g chlorophyl_100g \\\n", "1 NaN ... NaN NaN NaN \n", "2 NaN ... NaN NaN NaN \n", "3 NaN ... NaN NaN NaN \n", "4 NaN ... NaN NaN NaN \n", "5 NaN ... NaN NaN NaN \n", "\n", " carbon-footprint_100g nutrition-score-fr_100g nutrition-score-uk_100g \\\n", "1 NaN 14.0 14.0 \n", "2 NaN 0.0 0.0 \n", "3 NaN 12.0 12.0 \n", "4 NaN NaN NaN \n", "5 NaN NaN NaN \n", "\n", " glycemic-index_100g water-hardness_100g hasE s100 \n", "1 NaN NaN False 17 \n", "2 NaN NaN False 17 \n", "3 NaN NaN False 13 \n", "4 NaN NaN False 5 \n", "5 NaN NaN True 9 \n", "\n", "[5 rows x 165 columns]"]}, "execution_count": 17, "metadata": {}, "output_type": "execute_result"}], "source": ["import numpy\n", "\n", "ddfe100 = ddfe.assign(s100=ddf.apply(lambda row: sum(0 if numpy.isnan(row[g]) else 1 for g in g100), \n", " axis=1, meta=float))\n", "ddfe100 = ddfe100[ddfe100.s100 > 0]\n", "ddfe100.head()"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Bon la suite prend un peu de temps et \u00e7a n'est pas hyper efficace. Il faudrait un dask qui n'utilise pas dask mais uniquement les dataframes pour que \u00e7a aille plus vite. Caf\u00e9."]}, {"cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": ["ddfe100.to_csv(\"ddfe100*.csv\", sep=\"\\t\", encoding=\"utf-8\", index=False)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Bon je crois que je vais vraiment d\u00e9velopper une truc comme dask juste avec pandas."]}, {"cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [{"data": {"text/plain": ["['ddfe10000.csv',\n", " 'ddfe10001.csv',\n", " 'ddfe10002.csv',\n", " 'ddfe10003.csv',\n", " 'ddfe10004.csv',\n", " 'ddfe10005.csv',\n", " 'ddfe10006.csv',\n", " 'ddfe10007.csv',\n", " 'ddfe10008.csv',\n", " 'ddfe10009.csv',\n", " 'ddfe10010.csv',\n", " 'ddfe10011.csv',\n", " 'ddfe10012.csv',\n", " 'ddfe10013.csv',\n", " 'ddfe10014.csv',\n", " 'ddfe10015.csv']"]}, "execution_count": 19, "metadata": {}, "output_type": "execute_result"}], "source": ["dffefiles = [_ for _ in os.listdir(\".\") if \"ddfe\" in _]\n", "dffefiles"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Split...\n", "\n", "On impose les m\u00eames types pour chaque data frame."]}, {"cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": ["types = {k:v for k, v in zip(ddfe100.columns, ddfe100.dtypes)}"]}, {"cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["name ddfe10000.csv\n", "name ddfe10001.csv\n", "name ddfe10002.csv\n", "name ddfe10003.csv\n", "name ddfe10004.csv\n", "name ddfe10005.csv\n", "name ddfe10006.csv\n", "name ddfe10007.csv\n", "name ddfe10008.csv\n", "name ddfe10009.csv\n", "name ddfe10010.csv\n", "name ddfe10011.csv\n", "name ddfe10012.csv\n", "name ddfe10013.csv\n", "name ddfe10014.csv\n", "name ddfe10015.csv\n"]}], "source": ["from sklearn.model_selection import train_test_split\n", "\n", "for i, name in enumerate(dffefiles):\n", " print(\"name\", name)\n", " df = pandas.read_csv(name, sep=\"\\t\", encoding=\"utf-8\", dtype=types)\n", " df_train, df_test = train_test_split(df, test_size =0.5)\n", " df_test, df_eval = train_test_split(df_test, test_size =0.5)\n", " df_train.to_csv(\"off_train{0}.txt\".format(i), sep=\"\\t\", index=False, encoding=\"utf-8\")\n", " df_test.to_csv(\"off_test{0}.txt\".format(i), sep=\"\\t\", index=False, encoding=\"utf-8\")\n", " df_eval.to_csv(\"off_eval{0}.txt\".format(i), sep=\"\\t\", index=False, encoding=\"utf-8\")"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Ah j'allais oubli\u00e9, il faut bidouiller la colonne *additives* pour retirer \u00e9viter un memory leak et on recalcule la colonne hasE pour \u00eatre s\u00fbr."]}, {"cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [{"data": {"text/html": ["
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
additiveshasE
0[ russet-potatoes -> en:russet-potatoes ] [...False
1[ grade-a-reduced-fat-milk -> en:grade-a-redu...True
2[ cake -> en:cake ] [ sugar -> en:sugar ] ...True
3[ whole-grain-yellow-corn -> en:whole-grain-y...True
4[ fresh-cucumbers -> en:fresh-cucumbers ] [...True
\n", "
"], "text/plain": [" additives hasE\n", "0 [ russet-potatoes -> en:russet-potatoes ] [... False\n", "1 [ grade-a-reduced-fat-milk -> en:grade-a-redu... True\n", "2 [ cake -> en:cake ] [ sugar -> en:sugar ] ... True\n", "3 [ whole-grain-yellow-corn -> en:whole-grain-y... True\n", "4 [ fresh-cucumbers -> en:fresh-cucumbers ] [... True"]}, "execution_count": 22, "metadata": {}, "output_type": "execute_result"}], "source": ["df[[\"additives\", \"hasE\"]].head()"]}, {"cell_type": "code", "execution_count": 22, "metadata": {"collapsed": true}, "outputs": [], "source": ["import re\n", "reg = re.compile(\"[[](.*?)[]]\")\n", "addi = re.compile(\"(en[:]e[0-9])\")"]}, {"cell_type": "code", "execution_count": 23, "metadata": {"scrolled": false}, "outputs": [{"data": {"text/plain": ["([],\n", " ['en:basmati-rice',\n", " 'en:organic-white-basmati-rice',\n", " 'en:rice',\n", " 'en:white-basmati-rice'])"]}, "execution_count": 24, "metadata": {}, "output_type": "execute_result"}], "source": ["def has_emachine(v):\n", " if isinstance(v, (list, pandas.core.series.Series)):\n", " rem = []\n", " add = []\n", " for _ in v:\n", " if isinstance(_, str):\n", " fd = reg.findall(_)\n", " for __ in fd:\n", " if \" en:e\" in __ and addi.search(__): \n", " add.append(__)#.split(\"->\")[-1].strip())\n", " elif \" en:\" not in __:\n", " continue\n", " else:\n", " rem.append(__.split(\"->\")[-1].strip())\n", " else:\n", " continue\n", " return add, list(sorted(set(rem)))\n", " elif isinstance(v, float) and numpy.isnan(v):\n", " return [], []\n", " elif isinstance(v, str):\n", " if \",\" in v:\n", " raise Exception('{0}\\n{1}'.format(type(v), v))\n", " return has_emachine([v])\n", " else:\n", " # ???\n", " raise Exception('{0}\\n{1}'.format(type(v), v))\n", " \n", "hasE, clean = has_emachine(df.loc[1,\"additives\"])\n", "hasE, clean"]}, {"cell_type": "markdown", "metadata": {}, "source": ["On recompose le tout."]}, {"cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["name off_train0.txt (11307, 165)\n", "name off_train1.txt (11484, 165)\n", "name off_train10.txt (7296, 165)\n", "name off_train11.txt (7609, 165)\n", "name off_train12.txt (7780, 165)\n", "name off_train13.txt (7908, 165)\n", "name off_train14.txt (8732, 165)\n", "name off_train15.txt (5315, 165)\n", "name off_train2.txt (10963, 165)\n", "name off_train3.txt (11166, 165)\n", "name off_train4.txt (11113, 165)\n", "name off_train5.txt (11534, 165)\n", "name off_train6.txt (11922, 165)\n", "name off_train7.txt (9489, 165)\n", "name off_train8.txt (7725, 165)\n", "name off_train9.txt (7746, 165)\n", "merged (149089, 165)\n", "name off_test0.txt (5654, 165)\n", "name off_test1.txt (5742, 165)\n", "name off_test10.txt (3648, 165)\n", "name off_test11.txt (3805, 165)\n", "name off_test12.txt (3890, 165)\n", "name off_test13.txt (3954, 165)\n", "name off_test14.txt (4366, 165)\n", "name off_test15.txt (2657, 165)\n", "name off_test2.txt (5481, 165)\n", "name off_test3.txt (5583, 165)\n", "name off_test4.txt (5557, 165)\n", "name off_test5.txt (5767, 165)\n", "name off_test6.txt (5961, 165)\n", "name off_test7.txt (4745, 165)\n", "name off_test8.txt (3863, 165)\n", "name off_test9.txt (3873, 165)\n", "merged (74546, 165)\n", "name off_eval0.txt (5654, 165)\n", "name off_eval1.txt (5743, 165)\n", "name off_eval10.txt (3648, 165)\n", "name off_eval11.txt (3805, 165)\n", "name off_eval12.txt (3890, 165)\n", "name off_eval13.txt (3955, 165)\n", "name off_eval14.txt (4366, 165)\n", "name off_eval15.txt (2658, 165)\n", "name off_eval2.txt (5482, 165)\n", "name off_eval3.txt (5583, 165)\n", "name off_eval4.txt (5557, 165)\n", "name off_eval5.txt (5768, 165)\n", "name off_eval6.txt (5961, 165)\n", "name off_eval7.txt (4745, 165)\n", "name off_eval8.txt (3863, 165)\n", "name off_eval9.txt (3873, 165)\n", "merged (74551, 165)\n"]}], "source": ["off = [_ for _ in os.listdir(\".\") if \"off\" in _ and \"all\" not in _]\n", "\n", "for cont in ['train', 'test', 'eval']:\n", " sub = [_ for _ in off if cont in _]\n", " dfs = []\n", " for name in sub:\n", " df = pandas.read_csv(name, sep=\"\\t\", encoding=\"utf-8\", dtype=types)\n", " print(\"name\", name, df.shape)\n", " df[\"hasE\"] = df[\"additives\"].apply(lambda x: len(has_emachine(x)[0]) > 0)\n", " df[\"additives\"] = df[\"additives\"].apply(lambda x: \";\".join(has_emachine(x)[1]))\n", " dfs.append(df)\n", " df = pandas.concat(dfs, axis=0)\n", " print(\"merged\", df.shape)\n", " df.to_csv(\"off_{0}_all.txt\".format(cont), sep=\"\\t\", index=False, encoding=\"utf-8\")"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Il y aura probablement un ou deux data leak dans les autres colonnes.."]}, {"cell_type": "markdown", "metadata": {}, "source": ["On d\u00e9coupe le jeu d'\u00e9valuation."]}, {"cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [{"data": {"text/plain": ["165"]}, "execution_count": 26, "metadata": {}, "output_type": "execute_result"}], "source": ["len(types)"]}, {"cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": ["df_eval = pandas.read_csv(\"off_eval_all.txt\", sep=\"\\t\", dtype=types, encoding=\"utf-8\")"]}, {"cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": ["df_eval_X = df_eval.drop(\"hasE\", axis=1)\n", "df_eval_X.to_csv(\"off_eval_all_X.txt\")\n", "df_eval[[\"hasE\"]].to_csv(\"off_eval_all_Y.txt\")"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Premier mod\u00e8le"]}, {"cell_type": "code", "execution_count": 28, "metadata": {"collapsed": true}, "outputs": [], "source": ["df_train = pandas.read_csv(\"off_train_all.txt\", sep=\"\\t\", dtype=types, encoding=\"utf-8\")"]}, {"cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [{"data": {"text/plain": ["(149089, 165)"]}, "execution_count": 30, "metadata": {}, "output_type": "execute_result"}], "source": ["df_train.shape"]}, {"cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": ["X = df_train[g100].fillna(0)\n", "Y = df_train['hasE']"]}, {"cell_type": "code", "execution_count": 31, "metadata": {"collapsed": true}, "outputs": [], "source": ["from sklearn.linear_model import LogisticRegression\n", "clf = LogisticRegression()"]}, {"cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [{"data": {"text/plain": ["LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,\n", " intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,\n", " penalty='l2', random_state=None, solver='liblinear', tol=0.0001,\n", " verbose=0, warm_start=False)"]}, "execution_count": 33, "metadata": {}, "output_type": "execute_result"}], "source": ["clf.fit(X, Y)"]}, {"cell_type": "code", "execution_count": 33, "metadata": {"collapsed": true}, "outputs": [], "source": ["pred = clf.predict(X)"]}, {"cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [{"data": {"text/plain": ["array([[32549, 33404],\n", " [15803, 67333]], dtype=int64)"]}, "execution_count": 35, "metadata": {}, "output_type": "execute_result"}], "source": ["from sklearn.metrics import confusion_matrix\n", "confusion_matrix(Y, pred)"]}, {"cell_type": "code", "execution_count": 35, "metadata": {"collapsed": true}, "outputs": [], "source": ["df_test = pandas.read_csv(\"off_test_all.txt\", sep=\"\\t\", dtype=types, encoding=\"utf-8\")"]}, {"cell_type": "code", "execution_count": 36, "metadata": {"collapsed": true}, "outputs": [], "source": ["X_test = df_test[g100].fillna(0)\n", "Y_test = df_test['hasE']"]}, {"cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [{"data": {"text/plain": ["array([[16309, 16849],\n", " [ 7945, 33443]], dtype=int64)"]}, "execution_count": 38, "metadata": {}, "output_type": "execute_result"}], "source": ["pred = clf.predict(X_test)\n", "confusion_matrix(Y_test, pred)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## ROC"]}, {"cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["[[ 0.23453245 0.76546755]\n", " [ 0.44914289 0.55085711]\n", " [ 0.67701244 0.32298756]]\n", "[ True True False]\n"]}], "source": ["y_proba = clf.predict_proba(X_test)\n", "y_pred = clf.predict(X_test)\n", "print(y_proba[:3])\n", "print(y_pred[:3])"]}, {"cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [{"data": {"text/plain": ["(numpy.ndarray, pandas.core.series.Series, numpy.ndarray)"]}, "execution_count": 40, "metadata": {}, "output_type": "execute_result"}], "source": ["y_test = Y_test.values\n", "type(y_pred), type(Y_test), type(y_test)"]}, {"cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [{"data": {"text/plain": ["array([ 0.76546755, 0.55085711, 0.67701244])"]}, "execution_count": 41, "metadata": {}, "output_type": "execute_result"}], "source": ["import numpy\n", "prob_pred = numpy.array([(y_proba[i, 1] if c else y_proba[i, 0]) for i, c in enumerate(y_pred)])\n", "prob_pred[:3]"]}, {"cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [], "source": ["from sklearn.metrics import roc_curve\n", "fpr, tpr, th = roc_curve(y_pred == y_test, prob_pred)"]}, {"cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [{"data": {"text/plain": [""]}, "execution_count": 43, "metadata": {}, "output_type": "execute_result"}, {"data": {"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYoAAAEXCAYAAACzhgONAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzs3Xd4FFX3wPHvSQgECITeS+i9FxHpRTqoKIK+gvITpIqo\niAURQREQFaSjIviCAiIvIB3pTSAgLYD0EqSEEnr6/f0xCwkIyQLZzG5yPs/Dw8zs7OzJEHJy5957\nrhhjUEoppR7Ey+4AlFJKuTdNFEoppeKliUIppVS8NFEopZSKlyYKpZRS8dJEoZRSKl6aKJRSSsVL\nE4VS8RCR4yJyS0Sui8hZEZkqIn5xXq8pIqtE5JqIXBGR30Wk9D3XyCgio0TkpOM6Rxz72ZL+K1Lq\n4WmiUCphrYwxfkBFoBLwAYCIPAksB+YDeYBCwC5go4gUdpyTGlgJlAGaAhmBJ4GLQPWk/TKUejSi\nM7OVejAROQ68boz5w7E/AihjjGkhIuuBPcaYHve8ZwkQYozpKCKvA58DRYwx15M4fKUShbYolHKS\niOQDmgGHRSQdUBP49T6nzgYaO7YbAUs1SShPpolCqYTNE5FrwCngPPAJkAXr/8+Z+5x/Brjd/5D1\nAeco5TE0USiVsGeMMRmAekBJrCRwGYgBct/n/NzABcf2xQeco5TH0EShlJOMMWuBqcBIY8wNYDPw\nwn1ObYfVgQ3wB9BERNInSZBKuYAmCqUeziigsYhUAN4HOonImyKSQUQyi8hnWKOaPnWc/1+sR1a/\niUhJEfESkawi8qGINLfnS1Dq4WiiUOohGGNCgJ+AgcaYDUAT4DmsfogTWMNnaxljDjnOD8fq0D4A\nrACuAluxHl9tSfIvQKlHoMNjlVJKxUtbFEoppeKliUIppVS8NFEopZSKlyYKpZRS8dJEoZRSKl6p\n7A7gUWTLls0EBATYHYZSSnmM7du3XzDGZH+U93pkoggICCAwMNDuMJRSymOIyIlHfa8+elJKKRUv\nTRRKKaXipYlCKaVUvDRRKKWUipcmCqWUUvFyaaIQkSkicl5E9j7gdRGRb0XksIjsFpHKroxHKaXU\nw3N1i2Iq0DSe15sBxRx/ugITXByPUkqph+TSeRTGmHUiEhDPKW2An4xV6/xPEckkIrmNMbrGsFJK\nPa6wUKJPrOWHiWsf6zJ2T7jLi7X6123BjmP/ShQi0hWr1UGBAgWSJDillPI4kbfg+FL461s4tYb9\nZ3LQfXS3x7qkx3RmG2MmG2OqGmOqZs/+SLPQlVIqeYq4Dvumw5wm8G06bs55EU6tAaBs7hA+ez36\nsS5vd4viNJA/zn4+xzGllFLxiYmG0xtg51g4ugiibmEMzNldmj4LWjLpIz9a9eoBabPyAfDhd58/\n8kfZnSgWAL1EZCbwBHBF+yeUUuo+jIHQI3BiBZxeDydXws3zd14+mqoxvX6ty5KNUQBMDyxFq7RZ\nE+WjXZooROQXoB6QTUSCgU8AHwBjzERgMdAcOAzcBF5zZTxKKeVxQvbA/hlw+H9w+eDdr2UMICLg\nOUauq8OQkXsJC4siUyZfhg1rSJcuVRItBFePeuqQwOsG6OnKGJRSyuPcOAfrP4CgH+8+7psF8jwJ\nAU0hz5McDC3AM8/OZv/+nQC8/HI5vvrqaXLm9EvUcOx+9KSUUgqsDulDc2H71xCy6+7Xyr8BxZ+H\nfHXAO/Wdw3nSR3DtWgTFimVhwoQWNGxY2CWhaaJQSim7REfCnu9h92S4uBdiomJfK9wCqrwNeWvd\nSQ4xMYYZ/93Fs8+Wws8vNX5+qVm27D8ULpwZX1/X/TjXRKGUUkkp8iYcXw6H58KRBRB+Jfa1XNWg\n6DNQrguku3saQFDQebp1W8SGDSd5991zfPnl0wCULu366QKaKJRSytXCr1hDWA/NhWNLIOpm7GtZ\nSkKF7lDqZbjPKKWbNyMZMmQtI0duJioqhhw50lOlSp4kDF4ThVJKuUbkLTg4G/6eBSf+gJjI2Ndy\nVYNibaHos5Cl+AMvsXjxIXr2XMzx46GIQLduVRg6tCGZM6dNgi8gliYKpZRKTJcPwZ+fWUNajWNG\ntHhBvrpQ7Dnr0VLGhMsQbd16mhYtfgagQoWcTJzYkho18rky8gfSRKGUUo8rJsqaALdjtPVo6bas\nZaBkeyjfFdLlSPAyxhhEBIDq1fPyyivlqVQpF717P0GqVPZVXNJEoZRSjyr8CqzuY9VZut16QKBg\nY6jU2xq55PjBn5CtW0/Tq9diJk1qSaVKuQH46adnXRT4w9FEoZRSDyMm2mo5rH3n7uOpM1jDWct1\ngQx5nb7clSthfPTRKsaP34Yx8Pnn65kzp10iB/14NFEopVRCIm/Bmc1Wy+He2dI5KsETH1l9D17e\nTl/SGMOsWUH07buMs2evkyqVF2+/XYOBA+smcvCPTxOFUkrdz4W91milkyvh6MJ/v16pN1Tr/1Ct\nh9tOnrxCly6/s3z5EQBq1szPxIktKFcu5+NG7RKaKJRSCuDGWTi+DM5sgRPLrUqtcXmngZIvQbnX\nrXpLTvY93I+Xl7Bp0ykyZ/ZlxIjGdO5cCS+vR7+eq2miUEqlTDdDrMV99v10/xZDGn8o0AiKtLKG\ntvoHPNbHbd58iurV8+Lt7UW+fBmZM+cFKlXKTY4c6R/ruklBE4VSKvmKjoSrx+HS33D5b6tM9+3t\nG2f/fX5AE8jfAHI/YbUa4hTge1Tnz9/g3XeX89//7mbMmGb06lUdgCZNij72tZOKJgqlVPIQEwUH\nfoGQ3bEJ4cqRuwvtxeWdBvLWhvx1oUhryFbusR4n/SucGMMPP+ygf/8/uHw5jDRpvImIeLwlSe2i\niUIp5dmMgeC1sP5Da2TSvTIUgCwlIHMJyFzc2s5SAjLkt2ZMu8CePefo1m0RmzadAqBx48KMH9+C\nokWzuOTzXE0ThVLKM0WFwbGlsPlTCLEW7sErFVR+C3JVt5JBpqLgky5Jw9q48SR1604lOtqQK5cf\no0Y1oV27MndmXHsiTRRKKc9hjNUBvf0bOPp77HHfzFC4JTw1BDIWtC08gBo18lGlSh6qVcvDZ581\nIFMmX1vjSQyaKJRS7u/6P1aRvW0j4NaF2OOZilqdzg3HWTOjbXDq1BXef38lI0Y0Im/ejHh7e7F+\n/WukTu385Dt3p4lCKeW+wq9aa0fvGh97zMcPAp62Jrvlrm5baFFRMXz77RYGDlzNjRuReHkJ//2v\nVZspOSUJ0EShlHI3MdFwZD4cng8Hfo4dtZT7CSjbGUq9Aj5Jux7Dvf78M5hu3Raya9c5ANq2LcWw\nYQ1tjcmVNFEopewXcd1aFvT4Mjg8DyKuxr6WsyrU+BiKtrYvPofLl2/xwQcrmTx5O8ZAQEAmxo5t\nRosWD158KDnQRKGUss/hBfDnEDgXePfxjAWhZAco3RGylrIntvs4fjyU777bgbe3F++++yQff1yX\ndOl87A7L5TRRKKWS1rVga3GfFV3//VrxdvDEB5C9QqJOfnsc//xzjTx5rI7ySpVyM3ZsM2rXLkjZ\nsgkvRJRcaKJQSrleTBTsmgTHFjtWgDOxr5V7HWp+Cn55bAvvfsLCohg+fANDh25gzpwXaNWqBADd\nu1ezObKkp4lCKeU653dC0FT4e9bdtZUKt4AibaxlQm0a1hqflSuP0r37Ig4dugRYnde3E0VKpIlC\nKZW4Qo/AwTlwYGbsjGmwSmmU7wrluzi1frQdzp27zjvvLGfGjD0AlCqVjQkTWlC3boC9gdlME4VS\nKnGc3WbNmD4wkzuPltJkgmJtocyrkLemy2orJYbNm0/RvPnPhIaG4eubioED6/DOOzWT3ZyIR6GJ\nQin1eK6ehDV94dBca9/LB4q/YK3jUPQZSOUZJSzKlcuJn19qatTIx7hxzSlcOLPdIbkNTRRKqYdn\nDJxaDVu+gJN/xB4v3g5qfQ6Z3X+thevXI/jqq028805N/PxS4+eXmq1bXydXLj+PLuDnCpoolFLO\nM8aaELfxY7gYZB3zTmMV5KvWz5o97QHmzTtA795LCA6+yrVrEYwc+TQAuXO7X8e6O9BEoZRyzv6f\nYeswuGB19JIqnZUcKvWGtFntjc1JJ06E8uabS1mw4G8AqlbNQ4cOZW2Oyv25NFGISFNgNOANfG+M\nGXbP6wWAaUAmxznvG2MWuzImpdRDiI6AEytgxRtw/bR1zMcPnvzEGr2Uxt/e+JwUGRnNqFF/MmjQ\nWm7ejCRDhtQMHdqQ7t2r4u3tvh3s7sJliUJEvIFxQGMgGNgmIguMMfvinDYAmG2MmSAipYHFQICr\nYlJKOSkmGnaMhsCRcOOMdczHD8q+BrWHJfliQI9rzZrjvPee1ZfSrl0ZvvmmyZ3Z1iphrmxRVAcO\nG2OOAojITKANEDdRGCCjY9sf+MeF8SilnBG8zirt/c8maz9jQWv2dMWe1gJBHiI8PIo0aawfcY0b\nF+HNN6vTrFkxmjZ1/452d+N0ohCRtEABY8zfTr4lL3Aqzn4wcG9P1yBguYj0BtIDjZyNRymViIyB\nnePh0BxrBTmAdDmh0UQo2sZt6i45wxjD9Om76d//DxYteolKlXIDMHp0M5sj81xOPZwTkVbATmCp\nY7+iiCxIhM/vAEw1xuQDmgP/Fbn/jBwR6SoigSISGBISkggfrZQi7DKsfQ9+LAGresUmiQrdodNu\nKPaMRyWJv/++QMOGP9Gx4zzOnLnOjz/uTPhNKkHOtigGYT1KWgNgjNkpIoUSeM9pIH+c/XyOY3H9\nH9DUcc3NIuILZAPO33sxY8xkYDJA1apVzb2vK6UeQuhRWNcvdpIcWH0QlXpDlb6QLrt9sT2CW7ci\n+eKLDQwfvpGIiGiyZUvHyJGN6dixgt2hJQvOJopIY8yVeyahJPTDehtQzJFQTgPtgZfuOeck0BCY\nKiKlAF9AmwtKuYIxVpmNv8bA/umxx9PlsKq3lusCXp5XrmL79n948cU5HDlyGYDXX6/EsGGNyJrV\nszrc3ZmziSJIRF4CvEWkGPAmsCm+NxhjokSkF7AMa+jrFGNMkIgMBgKNMQuAd4DvRKQvVuJ51Rij\nrQWlEtuVYzCzDlwPjj2Wvx7U+gLy1LAtrMSQM6cf587doEyZ7Eyc2JJatQrYHVKyI878XBaRdMBH\nwNOOQ8uAz4wxYS6M7YGqVq1qAgMDEz5RqZQsOsIq0Lfvv7FlNlKlhRyV4OnvIGtpe+N7RNHRMcye\nHUS7dmXuzIEIDPyHChVy4uPjeS2ipCIi240xVR/lvU61KIwxN4GPRORzx7ZSyl2ZGFj7rlXJ9Q6B\nws2h3jeQuZhtoT2uHTvO0K3bQrZt+4eLF2/Rq1d1wJphrVzngYlCRHyMMZGO7ZrA94AfUEBEKgBv\nGGN6JE2YSqkEXT4EmwfDgV/ARFvH/AtD9f5Q9DlIl83e+B7DtWvhDBy4mm+/3UpMjCFv3gwUKOAZ\ns8KTg/haFF1FZJcxZgPwDdAEWABgjNklInWSIkClVDyMgT0/wIYP4VaccSDiDU1/hFIvu/UaEAkx\nxjB37n769FnK6dPX8PIS+vatwaef1iNDhjR2h5dixJcoJmIliA0AxphT94x6inZhXEqp+Nw4C/um\nw/r+1qOm20q+BGU7Wx3VHjiC6V7z5//N88//CkC1anmYNKnlnQl0Kuk8MFEYY6KxRjcBnHI8fjIi\n4gP0AfYnQXxKqdsirsHRxbD1CwjZFXvcNzNU7guV+0CajA9+vwdq1ao4DRsWom3bUnTtWkUL+NnE\n2eGx3bCqwObFmhOxHOjpqqCUUg7GwMmVsP1rq/ZS+BXruHhBzqpQ/QMo0hK8kseKAevXn6B//z/4\n9dcXyJs3I97eXqxY8YouJGQzZ0c9XQBednEsSqnbYqKsoa2BI+9uPeSsaq1BXfY1SJ/TvvgS2cWL\nN3nvvRVMmWKV3Bg2bANjxjQH0CThBpxKFCIyAvgMuIVV76kC8JYxZnq8b1RKPZzzO2HL59YjpijH\nSPS02aF0RyjVAXJU9qjaSwkxxjBt2i7efXc5Fy/ewsfHi/ffr8UHH9SyOzQVh7Pt1aeNMe+JyLNY\nVWBfAFYDmiiUelxhobB7Evw9G87viD2eJhPUHATlu0Gq5DfC5+DBi3Tp8jvr1p0AoH79AMaPb0HJ\nkp47jDe5cjZR3D6vBfCLMeaSNgeVekwnV8O2EXB8aewxn/SQq7q19kOR1uDtY198Lnb9egQbNpwk\ne/Z0fP11E15+uZw+ZnJTziaKhSJyAOvRU3cRyQ7YUr5DKY9266K17sPuyXfXXcpUBJ763Fr7IZWv\nffG52Pbt/1ClijWLunLl3Myc2ZaGDQuTJUtamyNT8XGq1hOAiGQBrhhjoh21nzIaY866NLoH0FpP\nyuOE7LGqtu75njuFl1NntDqmq/SF7OVsDc/V/vnnGn37LmP27CAWLGhPq1Yl7A4pxXF5rScReQFY\n6kgSA4DKWJ3btiQKpTxCxDUImmaNXvpnY+zxgCZQ5jXr0ZJP8v5NOjo6hvHjt/HRR6u4di2CdOl8\nCAnRcnGextlHTx8bY34VkVpYpTxGAhP499KmSqVsxsCxJbD5Uzj/F8REWse9UkGZV6HK25C1lK0h\nJpXt2//hjTcWsn37GQBaty7BmDHNtEaTB3I2Udwu19ECmGCMmS8ig1wTklIeJibamhR3fDkcmQ+h\nh2Nfy1kVij1rLS3qm9m+GJPYvHkHaNt2NjExhvz5MzJmTDPatClpd1jqETmbKE6LyCSgMTBcRNLg\n5HrbSiVblw7Cwdmw90e4cjT2eNpsUPx5qNoPMhW2Lz4bNWpUmIIF/XnuuVIMGlQPP7/UdoekHoOz\niaId1trWI40xoSKSG+jnurCUcmPnd8KSjnBhT+yxDPmhxIuQry4UbJSsRy7dz9Gjlxk8eC1jxzbH\nzy81fn6pCQrqQdq0yXd4b0ryMAsXzRWRHCJye53BA64LSyk3FHoEVr0JxxbHHivTCYo8A4VbJOs5\nDw8SERHNyJGbGDJkHWFhUeTJk4GhQxsCaJJIRpwd9dQa+ArIA5wHCmAlijKuC00pNxAVZj1a2voF\nXDvlOChQpBU0+RHSZrE1PDutW3eCbt0Wsn//BQD+85/y9Omj41uSI2cfPQ0BagB/GGMqiUh9oIPr\nwlLKZjfOwraRsO+nuxcEKtwKan4KOSvZF5vNLly4Sb9+K5g61SrgV7x4ViZMaEGDBoVsjky5irOJ\nItIYc1FEvETEyxizWkSGuzQypZKaMRC8FgK/gpN/WK0JgOwVoFwX6zFTaj97Y3QDmzefYurUnaRJ\n482HH9amf/+nSJMmeZQ5V/fn7L9uqIj4AeuAGSJyHohyXVhKJaEb52DXRDgww1p3+rasZaDeV9YE\nuRQuJOQG2bOnB6BVqxIMGVKfF18sQ7FiWW2OTCUFp0p4iEh6rNpOgrUuhT8wwxhz0bXh3Z+W8FCJ\n4tZFWPce7J8B0eHWsVRpofQrUO51yFXN3vjcwI0bEQwZso7Ro7ewaVNnXYbUg7m8hIcx5kac3WmP\n8kFKuY1rp2FNXzj4a+yxgCbWzOlibVPk6KX7WbToID17LubEiSuIwJo1xzVRpFDxJgoRuYZVwUy4\nU8nMegkwxpjktUCvSt6unbZKa+ydAsZRbCDPU/DEh1C4ub2xuZHg4Kv06bOUuXP3A1CxYi4mTmzB\nE0/kszkyZZd4E4UxJkNSBaKUy0TegPUfwl/fxh4r+DTU+kwfL91j0aKDtG//G9evR5A+vQ9DhtSn\nd+8nSJVKCzGkZM7Oo6gBBBljrjn2MwCljTFbXBmcUo8l/ArsHGetOx122TpWsDHUGAj5dKnN+6lQ\nIRcAzz1XilGjmpA/vxbwU86PepqAVVr8thv3OaaUe4i4bk2Q2zI09ljOKlB7mFVeQ90RGhrGxImB\n9OtXE29vL/Lly8jevd0pWDCT3aEpN+JsohATZ3iUMSZGRHTgtHIvkTfh0Fz4cwhcPmgdy17RWhio\n9Cugy2zeYYxh1qwg+vZdxtmz1/HzS02vXtUBNEmof3H2h/1REXkTqxUB0AM4Gs/5SiWdsFDYMcrq\nqL4tYwDU+8ZaWlQTxF0OH75Ejx6LWLHC+i9cs2Z+6tYtaHNUyp05myi6Ad8CA7BGP60EuroqKKWc\nEh1pldhY/nrsscwloGJPKPd/4JPOvtjcUHh4FMOHb2To0PWEh0eTObMvI0Y0pnPnSnh5aTJVD+bs\nPIrzQHsXx6KUc2KiIOgn+HMwXD1hHctSEmp9DkWf1RbEA8yYsYdPPlkDQKdOFfjyy8Z3ZlsrFR+X\n9jOISFNgNOANfG+MGXafc9oBg7BaKruMMS+5MiblwWKiYf906xHTlWPWMf9CUOUdqNgdRIdw3isq\nKubO0NZOnSrwxx9H6dq1CvXqBdgbmPIoLksUIuINjMNaFS8Y2CYiC4wx++KcUwz4AHjKGHNZRHK4\nKh7lwYyx5kD8NTZ2mVG/PPDU51DqZZ1JfR8xMYbvv9/B55+vZ9OmzuTNmxFvby9+/rmt3aEpD+TK\nFkV14LAx5iiAiMwE2gD74pzTBRhnjLkMdx5xKRXr1BrYNgKOLbH20+eyynyXeU0TxAPs3n2Obt0W\nsnlzMABTp+7ko4/q2ByV8mTOTrhLA7QFAuK+xxgzOJ635QVOxdkPBu5d1aS44/obsR5PDTLGLHUm\nJpXMXQuG2fWsVeUAUmeA6h9ClbdS3DKjzrpxI4JBg9bwzTd/Eh1tyJXLj1GjmtCuna4vph6Psy2K\n+cAVYDsQnsifXwyoB+QD1olIOWNM6L0nikhXHCOtChQocO/LKrmIuG5NlNv+dWxF1+LtoP431uMm\ndV9r1x6nY8d5nDxpFfDr1asan33WAH9/Tarq8TmbKPIZY5o+5LVPA/njXsNxLK5gYIsxJhI4JiIH\nsRLHtnsvZoyZDEwGq8z4Q8aiPMGJlTCvNUTdtPaLPQcNxoKfVixNSIYMaQgOvkqlSrmYNKkl1arl\ntTsklYw4O0xkk4iUe8hrbwOKiUghEUmNNbx2wT3nzMNqTSAi2bAeRelEvpTm1FqYXhXmNLKShHjD\nMwug9W+aJB4gKiqGefMO3NmvXDk3q1d3YuvWLpokVKJztkVRC3hVRI5hPXq6XWa8/IPeYIyJEpFe\nwDKs/ocpxpggERkMBBpjFjhee1pE9gHRQD+7FkNSNoi8CTNrwfm/Yo9Vew9qDLD6JNR9/flnMN26\nLWTXrnMsWNCeVq1KAFCnjs6uVq7hbKJo9igXN8YsBhbfc2xgnG0DvO34o1KKiGuwezJsHQ63QqwW\nRPHnoe5XkEF/G36Qy5dv8cEHK5k8eTvGQEBAJtKl05FfyvWcnZl9QkRqAcWMMT+KSHZAV5lXDy9k\nN8x5Gm6es/azloEWv0D2h32ymXIYY/j55z28/fZyzp+/QapUXvTrV5MBA+poolBJwtnhsZ8AVYES\nwI+ADzAdeMp1oalkJToC/vzMmhMRHW5Vda0xwOqw1pIb8Zo4MZAePayGee3aBZgwoQVlyujcVJV0\nnH309CxQCdgBYIz5x7F4kVIJO7cDlnWGkF3WfpnXoMG3kFobpc545ZUKTJ68gzffrE6nThW1gJ9K\ncs4mighjjBERAyAiWklMJSz0KCx8Ec4FWvuZikCTKZBPZwnH548/jjJs2AbmzWuPn19q/PxSs2NH\nV0RbXsomzg6PnS0ik4BMItIF+AP4znVhKY8WHQEbB8KU4laSEC8o2xk67tIkEY9z567z8stzadz4\nv6xceYyxY7feeU2ThLKTs53ZI0WkMXAVq59ioDFmhUsjU54p9Aj8/KQ1mgmg4NNQfzRkLWlvXG4s\nJsYwefJ23n//D65cCcfXNxUDB9bh7beftDs0pYCHKAroSAyaHNT9GQM7x8GqNwEDfnmtNapL/8fu\nyNza7t3n6Nr1d7ZssYoWNGtWlLFjm1O4cGabI1MqVryJQkQ2GGNqicg1rPUi7ryENQ0io0ujU54h\nLBRmPgUXHYWBCzSEptN0ToQTTpwIZcuW0+TJk4HRo5vStm0pfcyk3E68icIYU8vxt45wUvd3aC4s\nfQ0iroJ3Gmud6grddMjrAxhjCAoKoWxZa3hrq1Yl+O67VrRrV4aMGdPYHJ1S9+f0oycRqYxVysMA\nG4wxfyXwFpWc3boIq/vA/hnWvn8haLsMMhezNy43duJEKL17L2HRokMEBnahUiWrjtXrr1e2OTKl\n4ufUqCcRGQhMA7IC2YCpIjLAlYEpN/b3r/BjKStJePlA1X7Q+aAmiQeIjIxmxIiNlC49nt9/P4if\nX2qOHr1sd1hKOc3ZFkUHoJIxJgxARIZhTb77zFWBKTcUHQEr3oCgqdZ+rurQ9EfIWtrWsNzZxo0n\n6dZtEXv3Wos3vvhiGb7+ugl58ujTXOU5nE0UxwFfIMyxnwY44oqAlJs6vhyWvw7XHIsWVnkb6owA\nL29743Jjkydv5403FgJQuHBmxo9vTpMmRW2OSqmHl9CopzFYfRLhQJCIrHDsNwY2uD48ZbuYKPhf\nKzjuWKE2fS5oOUsnzjmhefNiZM2alu7dq/Lhh7VJm1YL+CnPlFCLwlF7ge3A/+IcX+OSaJR7uXYa\nJueL3a/SF2oN1TWrH+DAgQuMHbuV0aOb4u3tRb58GTl2rA8ZMuhoJuXZEhoeOy2pAlFuxBhr2Ouy\nztZ+6ozwzHzIX8/WsNzVrVuRDB26nuHDNxIZGUOZMtnp3r0agCYJlSw4PTxWpRAX98GSjnBuu7Wf\n+wloPRf88tgbl5tavvwIPXos4sgRaxTT669Xol27MjZHpVTi0kShYh2aayWJyBvWsNfaw6Dym+Cl\n3yb3OnPmGn37LmPWrCAAypTJzsSJLalVq4DNkSmV+PQngLKsfQ8Cv7S2Cz4NzadDuuz2xuTGfvtt\nP7NmBZE2bSoGDapH37418PHREWAqeXJ2hbviQD+gYNz3GGMauCgulVSiI2BFVwhydEdVeRvqDNdW\nxH2EhoaRKZPVkd+9e1WOHbtM795PEBCQyebIlHItZ38a/ApMxFqDItp14agkdfUEzHsGQnZa+0//\nAOU62xuTG7p6NZyBA1czbdou9u7tTt68GfH29uKrr5rYHZpSScLZRBFljJng0khU0om8Abu/g82D\nIPwKpM+vXUXgAAAgAElEQVRtzbAO0B98cRlj+O23/fTps5R//rmGl5ewcuUxOnasYHdoSiUpZxPF\n7yLSA2suRfjtg8aYSy6JSrlOyG6Y3QDCLlr7BRpCy9mQNou9cbmZY8cu06vXEhYvPgRA9ep5mTix\nxZ1CfkqlJM4mik6Ov/vFOWaAwokbjnIZY2D717DhQ6tfwr8Q1B4OxZ/XkuD3mD59N127/s6tW1H4\n+6fhiy8a0rVrFby9nV05WKnkxdmlUAu5OhDlQuFXYfFLcHSRtV+6IzQcB6n97I3LTZUunZ2IiGg6\ndCjL1183IVcuvU8qZXN21JMP0B24XeBnDTDJGBPporhUYjm2FP7oDlePW/vNZ0Cpl2wNyd1cuHCT\nWbP20rNndQAqV87N/v09KVYsq82RKeUenH30NAHwAcY79l9xHHvdFUGpRGAMBH4F6/uDiYEspaDZ\nT5Crqt2RuQ1jDNOm7eLdd5dz8eItChTwp1WrEgCaJJSKw9lEUc0YE3eoxyoR2eWKgFQiiLgGy/4P\nDv5q7Zf9P2gwBnzS2huXG9m3L4Tu3Rexbt0JABo0KETx4poclLofZxNFtIgUMcYcARCRwuh8Cvd0\n8wLMrmvVbPLxg3pfQfmudkflNm7ejOTzz9fx5ZebiIyMIXv2dHz9dRNefrkcop36St2Xs4miH7Ba\nRI4CgjVD+zWXRaUeza2LMK0M3DwPmYtDm3mQtZTdUbmVkSM3MXSotZTKG29U4YsvGpI5s7a0lIqP\ns6OeVopIMaCE49Dfxpjw+N6jkljIHpjX2koS6XLAc4shUxG7o3ILMTEGLy+rtfD220+yZctpBgyo\nzZNP5rc5MqU8Q0Ir3DUwxqwSkefueamoiGCMmevC2JSzji6yVqHDQOZi0HY5+AfYHZXtoqNjGD9+\nG5Mn72Dz5v/Dzy81fn6pWbRIR30p9TASalHUBVYBre7zmgE0Udjt/E5Y+CJgoEADxyxr7ZQNDPyH\nbt0Wsn37GQBmzw6ic+dKNkellGdKaIW7Txybg40xx+K+JiIJTsITkabAaMAb+N4YM+wB57UF5mCN\nrgq83znqPi79DXOetmo3FX/BWss6hXfIXrkSxoABqxg3bhvGQP78GRkzphlt2pS0OzSlPJazndm/\nAZXvOTYHqPKgN4iINzAOaAwEA9tEZIExZt8952UA+gBbnA1aAaFH4ZenrJpNGQOg6bQUnyQWLTpI\nly6/c+bMdby9hb59a/DJJ/Xw80ttd2hKebSE+ihKAmUA/3v6KTICvglcuzpw2Bhz1HGtmUAbYN89\n5w0BhnN3HSkVn1sXYV4rR5IoCO3X6xwJrE7rM2euU6NGPiZObEGFCrnsDkmpZCGhFkUJoCWQibv7\nKa4BXRJ4b17gVJz9YOCJuCeISGUgvzFmkYjEmyhEpCvQFaBAgRS83GTYZZhWDm6cgXQ54eVtKXYl\nuvDwKNatO0HjxtborlatSrB48Us0aVL0zignpdTjS6iPYr6ILAT6G2OGJuYHi4gX8DXwqjPnG2Mm\nA5MBqlatahIzFo8REw0LnrOSRJaS8OyiFJsk1q49Trduizh48CKBgV3ulP9u1qyYzZEplfwkWDfZ\nGBON1c/wsE4DcQeq53Mcuy0DUBZYIyLHgRrAAhHRYkQPsvFjOLUGfLNaSSJTyqvyHhJyg1dfnUe9\netM4cOACRYtmITxciwQo5UrOdmZvEpGxwCzgxu2Dxpgd8bxnG1DMMTrqNNAeuDOA3RhzBch2e19E\n1gDv6qin+zAxsHUYbP3C2m88KcUliZgYw48//sV77/3BpUu3SJPGmw8/rE3//k+RJo2u762UKzn7\nP6ym4+/BcY4ZoMGD3mCMiRKRXsAyrOGxU4wxQSIyGAg0xix4lIBTnOgIWNAWji609uuPguJt7Y3J\nBoMGrWHIkHUANGpUmPHjm2uFV6WSiBjjeY/7q1atagIDk3nDwxhrnsSat+D4MvDNDI2/S5FJAiA4\n+Cr1609j8OB6tG9fVgv4KfWQRGS7MeaRHu07u3CRP/AJsQsXrcWahHflUT5UJcAYmNscji+19sXL\nKvCXr07870tGFi06yNSpu5g5sy3e3l7ky5eRAwd66nKkStnA2f91U7CGxLZz/LkK/OiqoFK8pa/G\nJonsFaDN/BSTJIKDr9K27WxatvyFOXP28csve++8pklCKXs420dRxBgT95nHpyKy0xUBpWhR4bBt\nOOz7ydpv8QuUbG9vTEkkKiqGMWO2MHDgGq5fjyB9eh+GDKlP+/Zl7Q5NqRTP2URxS0RqGWM2AIjI\nU8At14WVAl09YdVtunzQ2q/8VopJElu3nuaNNxayc+dZAJ57rhSjRjUhf35/myNTSoHziaI7MM3R\nVyHAJaCTy6JKacJCrTLhlw9CmkzQeDKUeMHuqJLMli3B7Nx5loIF/Rk7tjktWxa3OySlVBzOLly0\nE6ggIhkd+1ddGlVKEhUO/2sBF/ZYq9K99Kc1wikZM8Zw+PClO8Nbe/SoRlRUDF27ViF9ei3gp5S7\ncap3UESyisi3wBqsJVFHi4gOYn9cUeEwsxb8swlSpYW2S5N9kjh8+BJNmkynSpXJnD5t/b7h7e1F\n375PapJQyk05O4xkJhACtAWed2zPclVQKcaKrnAuEFJngBdWgX+CS3x4rPDwKAYPXkvZsuNZseIo\nPj7eHDhwwe6wlFJOcLaPIosxZkic/c9E5BlXBJQihB6BZZ0heB14+cALKyFXNbujcplVq47RvbtV\nwA+gU6cKfPllY7JnT29zZEopZzibKFaLSHtgtmP/eWCRa0JK5sKvwPQq1t/eqeHpH5J1kvj883UM\nGLAagJIlszFhQgvq1QuwNyil1ENx9tHTG8DPQITjz0zgbRG5JiLase2s8Cswo5r1d+qM8NoBKP0f\nu6NyqWbNipE+vQ+ffVafnTvf0CShlAdydtRTBlcHkuzFRMHsBnD5EKRKBy9tSZZ9Ert3n2POnH0M\nHlwfgMqVc3PqVF8yZ9YV+JTyVE7XZxaR1sTWelpjjFnompCSocgbMLcFnN8B3mmgwybIWtLuqBLV\n9esRfPrpGr755k+iow1PPJGXFi2s+RCaJJTybM4WBRwGVANmOA71EZGnjDEfuCyy5MIYWPoaBK+1\nJtO1mgM5KtgdVaKaP/8AvXsv4dSpq4hAr17VqFUrBS9Xq1Qy42yLojlQ0RgTAyAi04C/AE0UCdk5\nDg7+Cl6prCGwOSvZHVGiOXnyCm++uYT58/8GrMdMkya1pGrVPDZHppRKTA+zNFgmrNIdAFqExxln\ntsDqPtZ24++SVZIAGDt2K/Pn/02GDKn57LMG9OxZTSu8KpUMOZsovgD+EpHVWLWe6gDvuyyq5ODG\nOZj/rLWMaemOUPZVuyNKFNevR+DnZ82gHjiwLteuhTNgQB3y5s1oc2RKKVdJcIU7sZYSywdEYfVT\nAGw1xpx1cWwP5PYr3N26BNPKwo0z1noSHTaBTzq7o3osly/f4oMPVrJ06WH27u1xJ1kopTyDS1e4\nM8YYEZlnjKkC6DrXCYm8Bf9rbiUJ7zTWynQenCSMMcyYsYd33lnO+fM38PHxYv36EzRrVszu0JRS\nScTZR09/ikg1Y8w2l0bj6UwM/NbE6psAeHYh+AfYGtLj+PvvC/TosZhVq44BULt2ASZObEnp0tlt\njkwplZScTRT1gW4ichy4gdVPYYwx5V0VmEda2QtOr7eK/LVb69Gd12PHbuWdd5YTERFN1qxp+fLL\nxrz6akWsJ5FKqZTE2UTRzKVRJAf7f4ZdE6ztRpM8OkkAFCzoT0RENJ07V2T48MZky+a5j8+UUo8n\n3kQhIr5AN6AosAf4wRgTlRSBeZQjv8Pil63tGh9DqQ72xvMIzp69zqpVx3jppXIAtGpVgj17ulO2\nbA6bI1NK2S2hFsU0IBJYj9WqKA30cXVQHuXS37CgrbVduBXU/NTeeB5STIxh0qRAPvhgJdeuRVCq\nVDYqVcoNoElCKQUknChKG2PKAYjID8BW14fkQc79BT9Xtwr+5a0Nz8wDD3qGv3PnWbp1W8iWLacB\naNasqNZlUkr9S0KJIvL2hjEmSjsy4wjZA7PrWkkiU1FoPgPEM2YlX7sWziefrGH06C3ExBjy5MnA\nt9825bnnSmlntVLqXxJKFBXirDchQFrH/u1RTylzOm74FZhe2UoSBZ+G1r9Baj+7o3Jav34rmDRp\nO15eQp8+TzB4cH0yZkxjd1hKKTcVb6IwxngnVSAeIzoCZjxhJYn0uaDlTI9IEsaYO62FAQPqcOjQ\nJb78sjGVK+e2OTKllLvzjGcl7sIYWPEGXP4bUqWF55aAb2a7o4pXZGQ0I0ZspH79aURHxwCQL19G\nVq7sqElCKeWUh6keq7YOh6Cp1vZzSyBHRVvDScjGjSfp1m0Re/eeB2D58iNaekMp9dA0UThr53jY\n4Fh+o/kMyF/X3njicfHiTd5//w++//4vAIoUycy4cc1p0qSozZEppTyRJgpnBP0EK3ta21XegVIv\n2RtPPGbO3Evv3ku4cOEmPj5e9O//FB9+WJu0aX3sDk0p5aFc2kchIk1F5G8ROSwi/1q/QkTeFpF9\nIrJbRFaKSEFXxvNIgjfAss7WduW3oN5Ie+NJQEjIDS5cuEm9egHs3t2dIUMaaJJQSj0Wl7UoRMQb\nGAc0BoKBbSKywBizL85pfwFVjTE3RaQ7MAJ40VUxPbToCFjZA0w0lOkE9b+xO6J/uXUrkp07z/Lk\nk/kB6NGjGvnyZeSZZ0rqnAilVKJwZYuiOnDYGHPUGBMBzATaxD3BGLPaGHPTsfsn1gJJ7mPXJLiw\nBzIUgHrulySWLTtM2bITaNJkOqdPW9NdvL29ePZZnTinlEo8rkwUeYFTcfaDHcce5P+AJS6M5+Ec\nXgBr3rK2a3zkVsNgz5y5Rvv2c2jadAZHj16mYMFMXLp0y+6wlFLJlFt0ZovIf4CqwAOHEolIV6Ar\nQIECBVwb0KWDsLSTtRBRgUZQrotrP89J0dExTJgQyEcfreLq1XDSpk3FoEH16Nu3Bj4+OjdSKeUa\nrkwUp4H8cfbzOY7dRUQaAR8BdY0x4Q+6mDFmMjAZrDWzEzfUOMJCYVZtCA+FfHWh7RK3KfTXrdvC\nO0NeW7YszpgxzQgIyGRzVEqp5M6Vj562AcVEpJCIpAbac8+a2yJSCZgEtDbGnHdhLM6JibKWMr15\nHrKUhDb/Ay+3aHQB0L17NQoW9Gfu3HYsWNBek4RSKkm47Kego9psL2AZ4A1MMcYEichgINAYswD4\nEvADfnV0vp40xrR2VUwJ2vgxnN0KCLRdamu/hDGG337bz+rVxxg3rgUAlSvn5vDhN0mVSiuvKKWS\njkt/XTbGLAYW33NsYJztRq78/IdyeL5VogOgyQ+Q0b4pHUePXqZXr8UsWXIYgOefL039+oUANEko\npZKc+zxXsdONczD/GWv7iQ+h7Gu2hBEREc1XX21i8OB1hIVF4e+fhmHDGlGnjvvNQ1RKpRyaKIyB\n35+3tvPWgqc+syWM9etP0K3bIvbtCwHgpZfK8dVXT5Mrl/uXMFdKJW+aKIKmwekNkDYbPP29bSOc\n5s07wL59IRQtmoXx45vTuHERW+JQSql7pexEcflQbB2nmoMhS4kk+2hjDKdOXaVAAX8APv20Pjly\npKdPnxr4+qbsfxallHtJuT2jxjgqwhrIXw8qdEuyj963L4S6dadSq9YUrl+PAMDPLzX9+9fSJKGU\ncjspN1HsGA0nVoB3miR75HTzZiQffriSChUmsn79ScLDozlw4ILLP1cppR5Hyvz1NSYKtg6ztmsN\nhUyu7w9YsuQQPXsu5tixUADeeKMKX3zRkMyZ07r8s5VS6nGkzESxth/cPAfpc0GFN1z+cX37LmXU\nqC0AlC+fk4kTW9wpC66UUu4u5T16OhsIO0ZZ2/VGgU96l39kgwaFSJ/eh5EjG7N9e1dNEkopj5Ky\nWhTRkbCmr7Vd6mUo6Zo1kgID/2Hz5lP07v0EAK1aleDYsT5kz+76pKSUUoktZSWKte9acyZ8s1it\niUR25UoYAwasYty4bYgItWsXpGLFXACaJJRSHivlJIqji2HnWECg2U+QLluiXdoYw+zZQbz11jLO\nnr2Ot7fw9ttPUrRolkT7DKWUskvKSBS3LsKi9tZCRJXfgsItEu3SR45comfPxSxbdgSAJ5/Mx8SJ\nLSlfPmeifYZSStkpZSSKLUMh4hpkKwu1v0jUS3/88WqWLTtCpky+DB/eiNdfr4yXl3ssdKSUXSIj\nIwkODiYsLMzuUFIcX19f8uXLh4+PT6JdM/knioO/wfavre36oyGV72Nf8tatSNKmtf4RRoxoTLp0\nPgwd2pAcObQfQimA4OBgMmTIQEBAAOImK0SmBMYYLl68SHBwMIUKFUq06ybv4bE3L8CSjtZ22c6Q\nv/5jXS4k5AadOs2jbt2pREfHAJAvX0a+/761Jgml4ggLCyNr1qyaJJKYiJA1a9ZEb8kl7xbF1i8g\n6ibkfvKxynTExBimTPmL995bweXLYaRJ482OHWeoVi1vIgesVPKhScIerrjvyTdRhOyOnVhXd+Qj\nJ4m9e8/TrdtCNm48BUDjxoUZP76FjmhSSqUYyfPRU3QELOpgjXIq+RLkrflIlxkyZC2VKk1i48ZT\n5MyZnl9+acuyZf/RJKGUBzh79izt27enSJEilC5dmubNm3Pw4MFEuXa9evUIDAx8pPceP36ctGnT\nUrFiRUqXLk3Hjh2JjIy88/qGDRuoXr06JUuWpGTJkkyePPmu9//000+ULVuWMmXKULp0aUaOHPlY\nX4szkmeiWN0XLu4D36xWa+IR+fv7Eh0dQ48eVTlwoBft25fV5rRSHsAYw7PPPku9evU4cuQI+/bt\nY+jQoZw7d+6xrx0dHf3Y1yhSpAg7d+5kz549BAcHM3v2bMBKbi+99BITJ07kwIEDbNiwgUmTJrFo\n0SIAlixZwqhRo1i+fDlBQUHs2LEDf3//x44nIckvUVzcD7vGW9utfwO/3E6/NTj4KosXH7qz37Nn\nNXbseINx41qQKdPjj5ZSSiWN1atX4+PjQ7dusevMVKxYkdq1a2OMoV+/fpQtW5Zy5coxa9YsANas\nWUPLli3vnN+rVy+mTp0KQEBAAIMHD6ZWrVr8+uuvAEyfPp2aNWtStmxZtm7dCsCNGzfo3Lkz1atX\np1KlSsyfPz/eOL29valevTqnT58GYNy4cbz66qtUrlwZgGzZsjFixAiGDbOqXX/xxReMHDmSPHny\nANZQ2C5dujzu7UpQ8uqjOLcDplextgs1h/x1nXpbVFQMY8ZsYeDANYjA/v09yZs3I97eXndKcCil\nHtFXLmqFv2Me+NLevXupUqXKfV+bO3cuO3fuZNeuXVy4cIFq1apRp06dBD/O19eXDRs2ADBx4kRu\n3LjBpk2bWLduHZ07d2bv3r18/vnnNGjQgClTphAaGkr16tVp1KgR6dPff1RkWFgYW7ZsYfTo0QAE\nBQXRqVOnu86pWrUqQUFBCX5drpR8WhQhe+Bnqwgf/oWh+Qyn3rZ162mqVfuOt99ezvXrETRuXEQn\nzCmVjG3YsIEOHTrg7e1Nzpw5qVu3Ltu2bUvwfS++eHcR0Q4dOgBQp04drl69SmhoKMuXL2fYsGFU\nrFiRevXqERYWxsmTJ/91rSNHjlCxYkWyZs1KgQIFKF++fOJ8cS6SfFoU69+3FiRKlwOeXw6+meI9\nPTQ0jA8/XMnEiYEYAwUL+jN2bHNatiyeRAErlULE85u/q5QpU4Y5c+Y81HtSpUpFTEzMnf175yLc\n2yq4t79SRDDG8Ntvv1GiRIl4P+t2H8XZs2epW7cuCxYsoHXr1pQuXZrt27fTpk2bO+du376dMmXK\n3Pm6tm/fToMGDR7qa3tcyaNFcesSHFtsbb/0p1Mr1r3yyv+YMCEQb28v+vd/iqCgHpoklEomGjRo\nQHh4+F0jhrZt28batWupXbs2s2bNIjo6mpCQENatW0f16tUpWLAg+/btIzw8nNDQUFauXBnvZ9zu\n29iwYQP+/v74+/vTpEkTxowZgzFWcvzrr7/ivUauXLkYNmwYX3xhlRbq2bMnU6dOZefOnQBcvHiR\n/v3789577wHwwQcf0K9fP86ePQtAeHg433777SPcoYeTPFoUM6pZf2cvD/4PnrZujLnzW8Cnn9bj\n2rVwxo5tTtmyOZIiSqVUEhER/ve///HWW28xfPhwfH19CQgIYNSoUdSpU4fNmzdToUIFRIQRI0aQ\nK5fVF9muXTvKly9P8eLFqVSpUryfkTlzZmrWrMnVq1eZMmUKAB9//DFvvfUW5cuXJyYmhkKFCrFw\n4cJ4r/PMM88waNAg1q9fT+3atZk+fTpdunTh2rVrGGN46623aNWqFQDNmzfn3LlzNGrU6M7Ps86d\nOyfCHYuf3M58nqRq1armzhjm31+Ag44m5nNLoFDTf50fHh7FsGEbOHjwEjNmPJeEkSqVMu3fv59S\npUrZHUaKdb/7LyLbjTFVH+V6nt2iuH4mNkkUbnXfJLFq1TG6d1/EwYMXAejXr6aOZFJKqYfg2X0U\nfzmezWWvAM/cPV753LnrvPLK/2jY8CcOHrxIyZLZWL26kyYJpZR6SJ7borh8CLZak1CoP/quWk4/\n/LCDd99dQWhoGL6+qRgwoDb9+j1F6tTeNgWrVMoTt09QJR1XdCd4ZqIwBn5tZG0Xf+FfE+uCgkII\nDQ2jSZMijBvXnCJFtDaTUknJ19eXixcvaqnxJHZ7PQpf38StJOGZndkl85jALmdAvOG1/dxIXZCj\nRy9Trpy1/Oj16xEsX36EZ58tqd+kStlAV7izz4NWuHuczmzPTBT5xQS+BdT/lgXBT9O79xKMMezb\n1xM/v9R2h6eUUm7ncRKFSzuzRaSpiPwtIodF5P37vJ5GRGY5Xt8iIgHOXvvkZX+eGZSFNm1mcvLk\nFbJnT09IyI3EDF8ppRQubFGIiDdwEGgMBAPbgA7GmH1xzukBlDfGdBOR9sCzxpgX73vBOPJn8jOX\nwt7mZrg3GTKk5vPPG9CjRzW8vT17EJdSSrmKu86jqA4cNsYcBRCRmUAbYF+cc9oAgxzbc4CxIiIm\ngewVfCUj4M0LL5Tmm2+akDdvxkQPXimllMWViSIvcCrOfjDwxIPOMcZEicgVICtw4d6LiUhXoKtj\nNxwG7f31V3CUhk/JsnGf+5UC6X2Ipfcilt6LWPFXKoyHxwyPNcZMBiYDiEjgozahkhu9Fxa9D7H0\nXsTSexFLRB5t7VZc25l9GsgfZz+f49h9zxGRVIA/cNGFMSmllHpIrkwU24BiIlJIRFID7YEF95yz\nALi9nNPzwKqE+ieUUkolLZc9enL0OfQClgHewBRjTJCIDAYCjTELgB+A/4rIYeASVjJxxuSET0kx\n9F5Y9D7E0nsRS+9FrEe+Fx454U4ppVTS0YkHSiml4qWJQimlVLzcNlG4svyHp3HiXrwtIvtEZLeI\nrBSRgnbEmRQSuhdxzmsrIkZEku3QSGfuhYi0c3xvBInIz0kdY1Jx4v9IARFZLSJ/Of6fNLcjzqQg\nIlNE5LyI7H3A6yIi3zru1W4RqZzgRY0xbvcHq/P7CFAYSA3sAkrfc04PYKJjuz0wy+64bbwX9YF0\nju3uKfleOM7LAKwD/gSq2h23jd8XxYC/gMyO/Rx2x23jvZgMdHdslwaO2x23C+9HHaAysPcBrzcH\nlgAC1AC2JHRNd21R3Cn/YYyJAG6X/4irDTDNsT0HaCjJs6Z4gvfCGLPaGHPTsfsn1pyV5MiZ7wuA\nIcBwIDnXuHbmXnQBxhljLgMYY84ncYxJxZl7YYDbtX78gX+SML4kZYxZhzWK9EHaAD8Zy59AJhHJ\nHd813TVR3K/8R94HnWOMiQJul/9Ibpy5F3H9H9ZvC8lRgvfC0YzOb4xZlJSB2cCZ74viQHER2Sgi\nf4rIvxeVTx6cuReDgP+ISDCwGOidNKG5pYf9meI5JTxUwkTkP0BVoG5C5yZHIuIFfA28anMo7iIV\n1uOnelitzHUiUs4YE2prVPboAEw1xnwlIk9izd8qa4yJsTswT+CuLQot/xHLmXuBiDQCPgJaG2PC\nkyi2pJbQvcgAlAXWiMhxrOevC5Jph7Yz3xfBwAJjTKQx5hhW2f9iSRRfUnLmXvwfMBvAGLMZ8MUq\nGJgSOfUzJS53TRRa/iNWgvdCRCoBk7CSRHJ9Dg0J3AtjzBVjTDZjTIAxJgCrv6a1MeaRi6G5MWf+\nj8zDak0gItmwHkUdTcogk4gz9+Ik0BBAREphJYqQJI3SfSwAOjpGP9UArhhjzsT3Brd89GRcW/7D\nozh5L74E/IBfHf35J40xrW0L2kWcvBcpgpP3YhnwtIjsA6KBfsaYZNfqdvJevAN8JyJ9sTq2X02m\nv1giIr9g/YKQzdEn8wngA2CMmYjVR9McOAzcBF5L8JrJ9F4ppZRKJO766EkppZSb0EShlFIqXpoo\nlFJKxUsThVJKqXhpolBKAdZ8JBHpJSJp7I5FuRdNFCpJiUi0iOwUkb0i8quIpEviz39GRErH2R/s\nmKxoKxEZJCLvPsT51xP58wUYBexOxhM21SPSRKGS2i1jTEVjTFkgAugW90XHJCCXfF86ZvA/g1U9\nFABjzEBjzB+u+DxP4igQ18tRUE6pu2iiUHZaDxQVkQAR2S8i44EdQH4R6SAi/9/e2YZWWYZx/Pd3\niww3Ty8OwQ8mFBVYIEURpUivHyIqUjBa0SICizSLIL8UJbGSjPxgJZWxQiM/9LKGQVI5SbB8CbcT\n1PzQlD4I+SGGswJxVx+u67Cnw9mZbOE2d/3g5rnPdb889/Ps7Fy77mfnf5Uj8lhfGSBpUNIbkn6K\n3BstYV8Uwne9kj6XdFHYuyW1S9oNPA/cA7weUc1lkjokLY++t0W+gnJo+p8f9iOSXo5zliVdVX0h\nktokfSGpS1J/bOE8G/P9IOni6Pe4pP2SeiR9OlpEJWluXE9PlJuq2pviPlTWdm/YZ0naEWN+lrQi\n7KQzQQgAAAMtSURBVK9pOHfJhrC1xFr2R7m5MMcHkvbFddRS6k2mAxOtnZ5lehVgMI6NQCeeP2MB\nMATcGG3zcMmFluj3HXBftBnQGvUXgU1R7wWWRn0dsDHq3cDbhfN3AMurX+OSDr8DV4T9I2BN1I8A\nq6L+JPB+jetqw7/p2hzrHgBWRtubhbkuKYx5pTDvS8BzNebdXhjbAJRq3MfZUZ8TaxCwDHivME8J\nV1fuY/iLthfG8WNgcdTnA79EvR14qNIX14qaNdHvoSxnv2REkZxtLpB0CDiAO4MtYT9qro0PcD3Q\nbWbHzSXkt+HJWMAdyvaobwUWSyrhH3q7w/5hoT+F/vW4Eug3s8MjzPFZHA/ijq0Wu8zshJkdxx1F\nV9jLhTFXS/peUhloBRaOsq5bgXcAzOy0mQ1UtQtol9QLfIPLRc+Nc94hab2kJTFuAM/RsUXS/bh8\nA8DtwKb4uXwJzJbUBNwJrA17N+5M54+y3uQcZFJqPSXnNH+b2aKiIfSpTo5xvjPRoBnr3EUqD3hP\nM/LvTfEh8FDh9VBhTAceHfVIaiNE+8ZBKx7BXGdmp+SquTPN7LA8N8ddwKuSdprZOkk34OJ4DwBP\n4Y5oBh7N/SfRUzzgXmZmfeNcYzLFyYgimYzsA5ZKmiOpAc8lUIkWZuBbRQAPAnvir+U/JS0J+8OF\n/tWcwLeHqukDFki6/AzmGA/NwDFJ5+Ef8qPxLb49h6SGiJ6KlIA/wkncAlwafecBf5nZVmADcG1E\nCSUz+wpYA1Qc9k4KiXwkVexfA6vCYVRUipNpSEYUyaTDzI5JWgvswrdWdphZZzSfBBZKOohvpawI\n+yPA5ng4/BsjK2J+gquIrmbY4WBm/0h6FFfgbcSlqzf/z5cG8ALwI3AU3x6q5bSKPA28K+kxPJp5\nAthbaN8GdEk6ABwCfg37NfhD+yHgVIxrBjolzcTv6zPRdzXwVmxfNeL5xlfiKWU3Ar3xn2j9wN1j\nvO5kCpPqscmUQtKgmTVN9DqSZDqRW09JkiRJXTKiSJIkSeqSEUWSJElSl3QUSZIkSV3SUSRJkiR1\nSUeRJEmS1CUdRZIkSVKXdBRJkiRJXf4FY0VYGrgXtJUAAAAASUVORK5CYII=\n", "text/plain": [""]}, "metadata": {}, "output_type": "display_data"}], "source": ["%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "plt.figure()\n", "lw = 2\n", "plt.plot(fpr, tpr, color='darkorange', lw=lw, label='Courbe ROC')\n", "plt.plot([0, 1], [0, 1], color='navy', lw=lw, linestyle='--')\n", "plt.xlim([0.0, 1.0])\n", "plt.ylim([0.0, 1.05])\n", "plt.xlabel(\"Proportion mal class\u00e9e\")\n", "plt.ylabel(\"Proportion bien class\u00e9e\")\n", "plt.title('ROC')\n", "plt.legend(loc=\"lower right\")"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Bon c'est un mod\u00e8le lin\u00e9aire donc je suis s\u00fbr que vous ferez mieux et puis il y a le pays, la date, les autres ingr\u00e9dients, bref pas mal de texte."]}, {"cell_type": "code", "execution_count": 43, "metadata": {"collapsed": true}, "outputs": [], "source": []}], "metadata": {"kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4"}}, "nbformat": 4, "nbformat_minor": 2}