{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# 2A.ml - Analyse de sentiments - correction\n", "\n", "C'est d\u00e9sormais un probl\u00e8me classique de machine learning. D'un c\u00f4t\u00e9, du texte, de l'autre une appr\u00e9ciation, le plus souvent binaire, positive ou n\u00e9gative mais qui pourrait \u00eatre graduelle."]}, {"cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": ["%matplotlib inline"]}, {"cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [{"data": {"text/html": ["
run previous cell, wait for 2 seconds
\n", ""], "text/plain": [""]}, "execution_count": 3, "metadata": {}, "output_type": "execute_result"}], "source": ["from jyquickhelper import add_notebook_menu\n", "add_notebook_menu()"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Les donn\u00e9es\n", "\n", "On r\u00e9cup\u00e8re les donn\u00e9es depuis le site UCI [Sentiment Labelled Sentences Data Set](https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentences) o\u00f9 on utilise la fonction ``load_sentiment_dataset``."]}, {"cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [{"data": {"text/html": ["
\n", "\n", "
\n", " \n", "
\n", "
\n", "
sentance
\n", "
sentiment
\n", "
source
\n", "
\n", " \n", " \n", "
\n", "
0
\n", "
So there is no way for me to plug it in here i...
\n", "
0
\n", "
amazon_cells_labelled
\n", "
\n", "
\n", "
1
\n", "
Good case, Excellent value.
\n", "
1
\n", "
amazon_cells_labelled
\n", "
\n", "
\n", "
2
\n", "
Great for the jawbone.
\n", "
1
\n", "
amazon_cells_labelled
\n", "
\n", "
\n", "
3
\n", "
Tied to charger for conversations lasting more...
\n", "
0
\n", "
amazon_cells_labelled
\n", "
\n", "
\n", "
4
\n", "
The mic is great.
\n", "
1
\n", "
amazon_cells_labelled
\n", "
\n", " \n", "
\n", "
"], "text/plain": [" sentance sentiment \\\n", "0 So there is no way for me to plug it in here i... 0 \n", "1 Good case, Excellent value. 1 \n", "2 Great for the jawbone. 1 \n", "3 Tied to charger for conversations lasting more... 0 \n", "4 The mic is great. 1 \n", "\n", " source \n", "0 amazon_cells_labelled \n", "1 amazon_cells_labelled \n", "2 amazon_cells_labelled \n", "3 amazon_cells_labelled \n", "4 amazon_cells_labelled "]}, "execution_count": 4, "metadata": {}, "output_type": "execute_result"}], "source": ["from ensae_teaching_cs.data import load_sentiment_dataset\n", "df = load_sentiment_dataset()\n", "df.head()"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Exercice 1 : approche td-idf\n", "\n", "La cible est la colonne *sentiment*, les deux autres colonnes sont les features. Il faudra utiliser les pr\u00e9traitements [LabelEncoder](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html), [OneHotEncoder](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html), [TF-IDF](http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html). L'un d'entre eux n'est pas n\u00e9cessaire depuis la version [0.20.0](http://scikit-learn.org/stable/whats_new.html#sklearn-preprocessing) de *scikit-learn*. On s'occupe des variables cat\u00e9gorielles."]}, {"cell_type": "markdown", "metadata": {}, "source": ["### La variable cat\u00e9gorielle\n", "\n", "Ce serait un peu plus simple avec le module [Category Encoders](http://contrib.scikit-learn.org/categorical-encoding/) ou la derni\u00e8re nouveaut\u00e9 de scikit-learn : [ColumnTransformer](http://scikit-learn.org/stable/modules/generated/sklearn.compose.ColumnTransformer.html#sklearn.compose.ColumnTransformer)."]}, {"cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": ["from sklearn.model_selection import train_test_split\n", "X_train, X_test, y_train, y_test = train_test_split(\n", " df.drop(\"sentiment\", axis=1), df[\"sentiment\"])"]}, {"cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [{"data": {"text/plain": ["(2250,)"]}, "execution_count": 6, "metadata": {}, "output_type": "execute_result"}], "source": ["from sklearn.preprocessing import LabelEncoder, OneHotEncoder\n", "le = LabelEncoder()\n", "le.fit(X_train[\"source\"])\n", "X_le = le.transform(X_train[\"source\"])\n", "X_le.shape"]}, {"cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": ["X_le_mat = X_le.reshape((X_le.shape[0], 1))"]}, {"cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [{"data": {"text/plain": ["OneHotEncoder()"]}, "execution_count": 8, "metadata": {}, "output_type": "execute_result"}], "source": ["ohe = OneHotEncoder(categories=\"auto\")\n", "ohe.fit(X_le_mat)"]}, {"cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": ["X_le_encoded = ohe.transform(X_le_mat)\n", "train_cat = X_le_encoded.todense()\n", "test_cat = ohe.transform(le.transform(X_test[\"source\"]).reshape((len(X_test), 1))).todense()"]}, {"cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [{"data": {"text/html": ["
\n", "\n", "
\n", " \n", "
\n", "
\n", "
sentance
\n", "
source
\n", "
amazon_cells_labelled
\n", "
imdb_labelled
\n", "
yelp_labelled
\n", "
\n", " \n", " \n", "
\n", "
0
\n", "
Now we were chosen to be tortured with this di...
\n", "
imdb_labelled
\n", "
0.0
\n", "
1.0
\n", "
0.0
\n", "
\n", "
\n", "
1
\n", "
Woa, talk about awful.
\n", "
imdb_labelled
\n", "
0.0
\n", "
1.0
\n", "
0.0
\n", "
\n", " \n", "
\n", "
"], "text/plain": [" sentance source \\\n", "0 Now we were chosen to be tortured with this di... imdb_labelled \n", "1 Woa, talk about awful. imdb_labelled \n", "\n", " amazon_cells_labelled imdb_labelled yelp_labelled \n", "0 0.0 1.0 0.0 \n", "1 0.0 1.0 0.0 "]}, "execution_count": 10, "metadata": {}, "output_type": "execute_result"}], "source": ["import pandas\n", "X_train2 = pandas.concat([X_train.reset_index(drop=True),\n", " pandas.DataFrame(train_cat, columns=le.classes_)],\n", " sort=False, axis=1)\n", "X_train2.head(n=2)"]}, {"cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [{"data": {"text/html": ["
\n", "\n", "
\n", " \n", "
\n", "
\n", "
sentance
\n", "
source
\n", "
amazon_cells_labelled
\n", "
imdb_labelled
\n", "
yelp_labelled
\n", "
\n", " \n", " \n", "
\n", "
0
\n", "
It looks very nice.
\n", "
amazon_cells_labelled
\n", "
1.0
\n", "
0.0
\n", "
0.0
\n", "
\n", "
\n", "
1
\n", "
As a European, the movie is a nice throwback t...
\n", "
imdb_labelled
\n", "
0.0
\n", "
1.0
\n", "
0.0
\n", "
\n", " \n", "
\n", "
"], "text/plain": [" sentance source \\\n", "0 It looks very nice. amazon_cells_labelled \n", "1 As a European, the movie is a nice throwback t... imdb_labelled \n", "\n", " amazon_cells_labelled imdb_labelled yelp_labelled \n", "0 1.0 0.0 0.0 \n", "1 0.0 1.0 0.0 "]}, "execution_count": 11, "metadata": {}, "output_type": "execute_result"}], "source": ["X_test2 = pandas.concat([X_test.reset_index(drop=True),\n", " pandas.DataFrame(test_cat, columns=le.classes_)],\n", " sort=False, axis=1)\n", "X_test2.head(n=2)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### tokenisation\n", "\n", "On tokenise avec le module [spacy](https://spacy.io/usage/spacy-101#annotations-token) qui requiert des donn\u00e9es suppl\u00e9mentaires pour d\u00e9couper en mot avec ``pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz`` selon les instructions d\u00e9voil\u00e9es dans le [guide de d\u00e9part](https://spacy.io/usage/models) ou encore ``python -m spacy download en``. Le module [gensim](http://www.xavierdupre.fr/app/papierstat/helpsphinx/notebooks/artificiel_tokenize.html?highlight=tokenisation#gensim) ne requiert pas d'installation. On peut aussi s'inspirer de l'example [word2vec pr\u00e9-entra\u00een\u00e9s](http://www.xavierdupre.fr/app/papierstat/helpsphinx/notebooks/text_sentiment_wordvec.html#word2vec-pre-entraines)."]}, {"cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": ["import spacy\n", "nlp = spacy.load(\"en_core_web_sm\")\n", "# Ca marche apr\u00e8s avoir install\u00e9 le corpus correspondant\n", "# python -m spacy download en_core_web_sm"]}, {"cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [{"data": {"text/plain": ["['Now',\n", " 'we',\n", " 'were',\n", " 'chosen',\n", " 'to',\n", " 'be',\n", " 'tortured',\n", " 'with',\n", " 'this',\n", " 'disgusting',\n", " 'piece',\n", " 'of',\n", " 'blatant',\n", " 'American',\n", " 'propaganda',\n", " '.',\n", " ' ']"]}, "execution_count": 13, "metadata": {}, "output_type": "execute_result"}], "source": ["doc = nlp(X_train2.iloc[0,0])\n", "[token.text for token in doc]"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### tf-idf\n", "\n", "Une fois que les mots sont tokenis\u00e9, on peut appliquer le *tf-idf*."]}, {"cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": ["from sklearn.feature_extraction.text import TfidfTransformer, CountVectorizer\n", "from sklearn.pipeline import make_pipeline\n", "tokenizer = lambda text: [token.text.lower() for token in nlp(text)]\n", "count = CountVectorizer(tokenizer=tokenizer, analyzer='word')\n", "tfidf = TfidfTransformer()\n", "pipe = make_pipeline(count, tfidf)"]}, {"cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [{"data": {"text/plain": ["Pipeline(steps=[('countvectorizer',\n", " CountVectorizer(tokenizer= at 0x000001DCC8835488>)),\n", " ('tfidftransformer', TfidfTransformer())])"]}, "execution_count": 15, "metadata": {}, "output_type": "execute_result"}], "source": ["pipe.fit(X_train[\"sentance\"])"]}, {"cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [{"data": {"text/plain": ["<2250x4495 sparse matrix of type ''\n", "\twith 29554 stored elements in Compressed Sparse Row format>"]}, "execution_count": 16, "metadata": {}, "output_type": "execute_result"}], "source": ["train_feature = pipe.transform(X_train2[\"sentance\"])\n", "train_feature"]}, {"cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": ["test_feature = pipe.transform(X_test2[\"sentance\"])"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Combinaison de toutes les variables"]}, {"cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [{"data": {"text/plain": ["((2250, 4495), (2250, 3))"]}, "execution_count": 18, "metadata": {}, "output_type": "execute_result"}], "source": ["train_feature.shape, train_cat.shape"]}, {"cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": ["import numpy\n", "np_train = numpy.hstack([train_feature.todense(), train_cat])\n", "np_test = numpy.hstack([test_feature.todense(), test_cat])"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Calage d'un mod\u00e8le"]}, {"cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [{"data": {"text/plain": ["RandomForestClassifier(n_estimators=50)"]}, "execution_count": 20, "metadata": {}, "output_type": "execute_result"}], "source": ["from sklearn.ensemble import RandomForestClassifier\n", "rf = RandomForestClassifier(n_estimators=50)\n", "rf.fit(np_train, y_train)"]}, {"cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [{"data": {"text/plain": ["0.7533333333333333"]}, "execution_count": 21, "metadata": {}, "output_type": "execute_result"}], "source": ["rf.score(np_test, y_test)"]}, {"cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": []}, {"cell_type": "markdown", "metadata": {}, "source": ["## Exercice 2 : word2vec\n", "\n", "On utilise l'approche [word2vec](https://en.wikipedia.org/wiki/Word2vec) du module [gensim](https://radimrehurek.com/gensim/models/word2vec.html) ou [spacy](https://spacy.io/usage/vectors-similarity). Avec [spacy](https://spacy.io/usage/vectors-similarity), c'est assez simple :"]}, {"cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [{"data": {"text/plain": ["(array([-0.21269655, -0.7653725 , -0.1316224 , -0.3766306 , 0.5549566 ,\n", " -0.60907495, 5.3928123 , 5.099738 , 4.210167 , 2.9974651 ],\n", " dtype=float32),\n", " (96,))"]}, "execution_count": 23, "metadata": {}, "output_type": "execute_result"}], "source": ["vv = nlp(X_train2.iloc[0, 0])\n", "list(vv)[0].vector[:10], vv.vector.shape"]}, {"cell_type": "markdown", "metadata": {}, "source": ["On fait la somme."]}, {"cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [{"data": {"text/plain": ["array([-11.796999 , -8.17019 , 3.1232045, -14.440253 , 20.460987 ,\n", " -8.738287 , 12.388309 , 23.718775 , -9.392727 , 1.9914403],\n", " dtype=float32)"]}, "execution_count": 24, "metadata": {}, "output_type": "execute_result"}], "source": ["sum([_.vector for _ in vv])[:10]"]}, {"cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": ["np_train_vect = numpy.zeros((X_train2.shape[0], vv.vector.shape[0]))\n", "for i, sentance in enumerate(X_train2[\"sentance\"]):\n", " np_train_vect[i, :] = sum(v.vector for v in nlp(sentance.lower()))"]}, {"cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": ["np_test_vect = numpy.zeros((X_test2.shape[0], vv.vector.shape[0]))\n", "for i, sentance in enumerate(X_test2[\"sentance\"]):\n", " np_test_vect[i, :] = sum(v.vector for v in nlp(sentance.lower()))"]}, {"cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": ["np_train_v = numpy.hstack([np_train_vect, train_cat])\n", "np_test_v = numpy.hstack([np_test_vect, test_cat])"]}, {"cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [{"data": {"text/plain": ["RandomForestClassifier(n_estimators=50)"]}, "execution_count": 28, "metadata": {}, "output_type": "execute_result"}], "source": ["rfv = RandomForestClassifier(n_estimators=50)\n", "rfv.fit(np_train_v, y_train)"]}, {"cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [{"data": {"text/plain": ["0.6146666666666667"]}, "execution_count": 29, "metadata": {}, "output_type": "execute_result"}], "source": ["rfv.score(np_test_v, y_test)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Moins bien..."]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Exercice 3 : comparer les deux approches\n", "\n", "Avec une courbe [ROC](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_curve.html) par exemple."]}, {"cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": ["pmodel1 = rf.predict_proba(np_test)[:, 1]\n", "pmodel2 = rfv.predict_proba(np_test_v)[:, 1]"]}, {"cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": ["from sklearn.metrics import roc_auc_score, roc_curve, auc\n", "fpr1, tpr1, th1 = roc_curve(y_test, pmodel1)\n", "fpr2, tpr2, th2 = roc_curve(y_test, pmodel2)"]}, {"cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [{"data": {"image/png": "iVBORw0KGgoAAAANSUhEUgAAAQQAAAD4CAYAAAAKL5jcAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy86wFpkAAAACXBIWXMAAAsTAAALEwEAmpwYAAAqw0lEQVR4nO3dd3xUVd7H8c9JQggkEEIIkJCEEJogIEKoSlsVKVIUFURBXRRdcfXRVWGLj+1Zy1p2LQiiAhYUrAiKFEFBlJbQpIYUSgiQRjopk5znjxuyMQQykJm5cye/9+vFK5mZm5lvhsxvzj1zitJaI4QQAF5mBxBCuA8pCEKISlIQhBCVpCAIISpJQRBCVPIx64FbtGiho6KizHp4IeqtuLi4DK11SE23mVYQoqKiiI2NNevhhai3lFJHznebnDIIISpJQRBCVJKCIISoZFofQk1KS0tJSUmhqKjI7Cgewc/Pj/DwcBo0aGB2FGERblUQUlJSaNKkCVFRUSilzI5jaVprMjMzSUlJoV27dmbHERZR6ymDUmq+UipNKbXnPLcrpdQbSqkEpdRupVSvSw1TVFREcHCwFAMHUEoRHBwsrS1xUezpQ1gIjLjA7SOBjhX/pgNz6hJIioHjyHMpLlatpwxa6w1KqagLHDIO+FAb86g3K6WaKaVCtdYnHBVSCFEhYS0c3UxJWTkHT+ZRUGz73c39/vgKyuvSPytwRB9CG+BYlcspFdedUxCUUtMxWhFERkY64KEdKzs7m08++YQHHngAgMcff5wVK1YwatQoXn755crjli1bxr59+5g1a9Y59xEQEEB+fv4Ff16Ii7E7JZtlO1MJPb6CaSeeA8BHKy6v8ehX6vRYjigINbVLa1x1RWs9D5gHEBMT43Yrs2RnZ/P2229XFoR33nmH9PR0GjZs+Lvjxo4dy9ixY2u9v/P9vBA1KSotI+X0GVJOF5Jy+gwJafnsOJJFVmoC/bzjedRnPgd9u/J+9H8ICQrkuq6t6RnRzKEZHFEQUoCIKpfDgVQH3K/LzZo1i8TERHr27ElISAgFBQX069ePv/71r0ycOLHyuIULFxIbG8tbb71FcnIykydPxmazMWLEf7taxo4de96fF/VPaVk5sYdPszU5i8yCYk4XlpJdWEJ2YSmnK77mF9tQlHOZOkYfrwMM8D7Iwz4HCWp42riTpuF0nvYF/wps47ScjigIy4AHlVKLgX5AjiP6D55Zvpd9qbl1DldV17CmPDWm5oYWwIsvvsiePXvYuXMnYDT/z35/Pg8//DB/+tOfmDp1KrNnz668ftmyZXb9vPBcGfnF/HQwnR8PpLHhUDp5RTaUgqZ+DQhq5ENUowL6NsikbVAG4UEniS7aT1jebnxLjb973TQc1fZaiOwP4X2hZVfwdu5IgVrvXSn1KTAUaKGUSgGeAhoAaK3nAiuAUUACUAjc7ayw7uiXX37hyy+/BGDKlCnMnDnT5ETCDEWlZew7kcvuY9nsTslhV0o2SRkFaA0RAfBsaCy9Gx0nTKfjk3sMco5BYbWPhIM7Qvcboe1AaDsQ1cz1/Wz2fMpwWy23a2CGwxJVuNA7uavNnj2bd999F4AVK1acc7t8vFe/2MrKiT+Vz+6UbHal5LA7JZuDJ/OwlRvdYiFNGnJFeCATrmjFOL2WsF1vok6chEbNIagttOoKnUdAs7YV/yKhWQT4+pv8m7nZSEWzNWnShLy8vHOunzFjBjNm1FzzrrrqKhYvXswdd9zBokWLnB1RuEhRaRnHsgo5mlXIkUzjq/F9AcdOn6HEVg5AUz8feoQ3Y/rgaHqEN+OKiEBaN/VDHVoDK++DrESI6A+3LDDe+d2cFIQqgoODueqqq+jWrRsjR46062def/11Jk+ezOuvv86ECROcnFA4S0Gxja3JWWxMyOCXhAwOnPz9G4O/rzeRwf50bNmEa7u0omtYU3qENyMquPHvW4iZifDJX+HQKgjuALctgU7Xg0VakcqsfRliYmJ09QVS9u/fT5cuXUzJ46nkOa1ZVkEJu1Ky2Xk0m18TM9hxNBtbucbXx4uYtkH0iWpOuxb+RAY3pm3zxjT3973wqWFxPvz8CmyaDd6+MGQm9LsffHxd90vZSSkVp7WOqek2aSEIj5dfbGPP8ZzKc/5dx7JJOX0GMN64e7QJ5N7B0VzVvgUxUUH4NfC2/85LCiB2Afz6BuSfgismw7VPQZPWTvptnEsKgvA42YUlrNp7kq3Jp9mdkk1Cej5nG8LhQY3oER7IHf3b0iM8kO5tAmnidwnTw4tyYOs82PQ2nMmCdkNg4iKI6OPYX8bFpCAIj3CmpIwf9p/im52prI9Po7RM0yLA6O2/oUcYPSIC6dEmkOCAOo4aLciELXNgyzwozoGO18PgxyCir2N+EZNJQRCWVVpWzsaEDJbtTGXV3pMUlpTRuqkfd1/VjrFXhHF5WFPHfSScdwo2vQnb5kNpAXQZC4P+AmE9HXP/bkIKgrAUrTXbj2bzzc7jfLf7BJkFJTT182FczzDG9WxD36jmeHk5sEc/JwV+eR22fwhlJdDtZhj0KLT0zI5aKQjCEo5lFfL1juN8tT2Fw5mF+DXw4tourRjXsw2DO7Wgoc9FdATaIz8d1j0LOz8FNFxxG1z9CAS3d+zjuBkpCC5QdTLUa6+9xnvvvYePjw8hISHMnz+ftm3bmh3RLeUWlfL9byf4cvtxtiZnATAgOpgZwzowolvrS+sMtEfqDlh8OxSkQ++74KqHjNGE9YAUBCcoKyvD27vmd6wrr7yS2NhYGjduzJw5c3jiiSdYsmSJixO6l7JyzdGsQuJP5ZGQls+hU3nEn8onIS2fkrJyokP8efz6zozrGUZ4UGPnhtn9OSx7EBq3gGlrPK6PoDZSEKr517/+hZ+fHw899BCPPPIIu3btYt26daxdu5YFCxYwevRonn/+ebTWjB49mpdeegkwZkY++uijrFq1ildffZVDhw7xwgsvEBoaSqdOnSrXRBg2bFjlY/Xv35+PP/4YgIkTJ3LnnXcyatQoAO666y7GjBnD+PHjmTVrFj/99BPFxcXMmDGD++67rzLrRx99hJeXFyNHjuTFF1905VN10Ups5RzJLCAhzXixH0rLJ/5UHkkZBZVDgQHaNGtEh5YBDOrYgpHdQ7kiPND580XKy2DtM0Z/QeRAuPVDCKhxtzOP5r4F4ftZcPI3x95n6+4w8sIvmsGDB/Pqq6/y0EMPERsbS3FxMaWlpWzcuJGOHTsyc+ZM4uLiCAoKYvjw4SxdupTx48dTUFBAt27dePbZZzlx4gSTJ08mLi6OwMBAhg0bxpVXXnnOY73//vuVQ6QnTZrEkiVLGDVqFCUlJaxdu5Y5c+bw/vvvExgYyLZt2yguLuaqq65i+PDhHDhwgKVLl7JlyxYaN25MVlaWY5+rOsgvtpFY8aJPTDe+JqTncySzkLLy/46MDQ9qRMeWAQzpFEKHlgF0bNWEDi0DCGjo4j/LM9nw5TRI+AFi/ggjXnLLEYau4L4FwSS9e/cmLi6OvLw8GjZsSK9evYiNjeXnn39mzJgxDB06lJAQ453j9ttvZ8OGDYwfPx5vb+/KuQxbtmz53XETJ04kPj7+d4/z8ccfExsby/r16wEYOXIkDz30EMXFxaxcuZLBgwfTqFEjVq9eze7du/niiy8AyMnJ4dChQ/zwww/cfffdNG5sNKGbN2/ukuenuhM5Z/j5UAb7UnMrX/wncv47rdfHS9E2uDEdWwYwsltrOrQMoENIE6JD/PF39Qu/Junx8OkkyD4Co1+DPtPMTmQqN/gfOY9a3smdpUGDBkRFRbFgwQIGDhxIjx49+PHHH0lMTCQyMpK4uLgaf87Pz+93/QYXauL+8MMP/POf/2T9+vWVpxJ+fn4MHTqUVatWsWTJEm67zZh1rrXmzTff5Prrr//dfaxcudKUadfFtjJiD59mfXw66w+mc/CUMQmosa837UMC6NeuufGir/jXNtifBt5uukHYgRXw9X3G3IM7l1tiNqLTaa1N+de7d29d3b59+865zgxPPfWUjoiI0GvWrNEnT57UERERevz48To1NVVHRkbq9PR0bbPZ9DXXXKOXLl2qtdba39+/8ufPHpeRkaFLSkr01VdfrWfMmKG11nr79u06Ojpax8fHn/O43377rR4/frwODw/XxcXFWmut33nnHT1u3DhdUlKitdb64MGDOj8/X3///fd6wIABuqCgQGutdWZmZo2/iyOe0yMZBfqDX5P1Hxds1Zf943vddua3usPfvtOT392k31mfoPefyNFlZeV1fhyXOZOt9dIZWj/VVOs5V2t9+qjZiVwKiNXneV26bwvBRIMGDeKf//wnAwYMwN/fHz8/PwYNGkRoaCgvvPACw4YNQ2vNqFGjGDdu3Dk/HxoaytNPP82AAQMIDQ2lV69elJWVAcZKzPn5+dxyyy2Asfr0smXLABg+fDhTp05l7Nix+Poa57D33HMPhw8fplevXmitCQkJYenSpYwYMYKdO3cSExODr68vo0aN4vnnn3fI73+mpIzNSZlGKyA+neSMAgAimjfi5t7hDO0cQv/oYPdo8l+sxHXwzZ8hLxWufhSGzgIfWQT3LJn+7OHsfU7LyjWfbDnC6n2n2JKcRYmtHL8GXgyIDmZIpxCGdG557tx/KynOg9VPQtwCaNEJxs+B8BpnAHs8mf4sarV421Ge/GYvHVoGMLV/W4Z0DqFPVPOLmwrsrpI3wDczIPsYDPwzDPs7NGhkdiq3JAVBcKakjNd/OERM2yA+v3+AdVsB1ZUUwA/PwNZ3oHk0/HGlsYKxOC+3Kwhaa8/5gzSZvaeDC389TFpeMW9N7uU5z/2RTbD0T3A62Vi56JqnwNfJoxw9gFsVBD8/PzIzM2UHaAfQFdvB+/n5XfC4nMJS5vyUwLDOIfRtZ85YBofRGo5uNhYu2fu1Mf/gru8g6mqzk1mGWxWE8PBwUlJSSE9PNzuKR/Dz8yM8PPyCx7yzIZHcIhuPX3+Zi1I5QUkh/PY5bH0XTv0GfoHGhKTBT0DDALPTWYpbFYQGDRrQrl07s2PUG2m5Rcz/JZmxV4TRNayp2XEuXmYixM6HHR8ZS5q16g5j3oDut8jpwSVyq4IgXOuNdYewlWkeva6T2VHsV15uzDnYOs/46uVtrF7Ud7rRYSinmnUiBaEe0VqTmF7A5qRMNiVlsmrPSSb1jSCqhfk7BtXqzGnYsQi2vWd0FAa0MgYV9b7LsiscuyMpCB5Ma01ShlEANidlsTkpk/S8YgBaN/Vj/JVt+Mt1nU1OWYvTR4z9DnZ/DrYzEDkArnkSLhtTb2ckOpMUBA+itSY5o6Dyxb85KZO0igLQqmlDrmofTP/oYAa0DyayuQVGHWYmwsLRxvTkHrdC33uNKezCaaQgeICycs0Xccd4Y20Cx7ONDUhaNmnIgIoC0D862HrDjk8fgQ/Ggq0Y7l1nbJAqnE4KgsVtSszkuW/3se9ELr3bBvHAsPYMiA6mXQt/axWAqnJS4IMboCTfmJYsxcBlpCBY1JHMAp5fsZ9Ve0/Rplkj3pp8JaO7h1q3CJyVewI+GGOcJkz9BkJ7mJ2oXrGrICilRgCvA97Ae1rrF6vdHgh8DERW3OcrWusFDs4qMFYinr0ugQW/HMbHW/H49Z2ZdnU7z5iElJ8GH441vk75Gtr0MjtRvVNrQVBKeQOzgeuAFGCbUmqZ1npflcNmAPu01mOUUiHAQaXUIq11iVNS10Nl5ZrF247y2up4sgpLuKV3OI8N70zLphcemmwZBZnw4TjjdOGOLz1mazSrsaeF0BdI0FonASilFgPjgKoFQQNNlNFeDQCyAJuDs9ZbmxIzeWb5Xg6czKNvu+Z8cENXurUJNDuW4xRmwUfjICsJJn8mS5mZyJ6C0AY4VuVyCtCv2jFvAcuAVKAJMFFrXV7tGJRS04HpYKwUJC6sxFbOK6sPMm9DEhHNGzHn9l6M6Nba+v0EVRXlwMc3QfpBuO1TiB5idqJ6zZ6CUNNfX/V5tdcDO4E/AO2BNUqpn7XWub/7Ia3nAfPAWDHpotPWI4np+Ty8eAd7jucypX9b/j66i2f0E1RVnAcf3wwn98DEj6HDtWYnqvfsKQgpQESVy+EYLYGq7gZerFjAMUEplQxcBmx1SMp6RGvN57EpPLVsLw0beDFvSm+GX+6BQ3NLCmDRrXA8Dm79ADqPMDuRwL6CsA3oqJRqBxwHJgGTqx1zFLgG+Fkp1QroDCQ5Mmh9kFNYyt++/o3vfjvBwPbBvHZrT1oHekinYVUFGfDZVDi2GSa8B13GmJ1IVKi1IGitbUqpB4FVGB87ztda71VK3V9x+1zgOWChUuo3jFOMmVrrDCfm9igHT+axfFcqn8cdIzO/hCdGdOa+we3xduS25u4idQcsmWJ8tHjTu9BtgtmJRBV2jUPQWq8AVlS7bm6V71OB4Y6N5tmS0vP5dvcJlu9K5VBaPl4KBrZvwTtTOtMzopnZ8ZxjxyL49hEIaAnTVkHYudvbCXPJSEUXOpZVyHe/GUVgb2ouSkGfqOY8N+5yRnQLJaSJh+4PYCuBVX81pi63GwI3zwf/FmanEjWQguBkRaVlfLr1KMt2pbLjaDYAPSOa8Y/RXRjdI5TQQA9fDjz3BHx+JxzbAgMfMhY79ZY/O3cl/zNOpLXm8S92s3xXKl1Dm/LEiM6M6RFGRPN6srzX0c1G52FxPty8ALrdZHYiUQspCE60aMtRlu9K5bHhnXjwDx3NjuM6WhunBytnGSsfT1kqMxYtQgqCk+w5nsOzy/cxpFMIDwztYHYc1yk9A9/9BXYugo7Xw03zoFEzs1MJO0lBcIKcM6U8sGg7wQG+/HtiT7w88ePDmmQfNT5SPLEThsyCITPBy023ghc1koLgYFprnvhiF6nZZ1hyX3+a+9eTdf8OrIBlD0JZKdy2GDqPNDuRuARSEBzs/Y3JrNp7in+M7kLvthbfCckeRTmw8q/GKUKr7nDLQmhRj06RPIwUBAfRWjP7xwReXRPP8K6tmHZ1PdhwJmm9saty7nEY9JhxiiArIVuaFAQHKCyx8fjnu/nutxOM7xnGixN6eNYU5epKCmHtM7BlLgR3gGlrIDzG7FTCAaQg1NGxrELu/TCW+FN5/G3UZdw7KNqzi0FKLHx9H2QmQN/74NqnZds0DyIFoQ5+TcxgxqLt2Mo18+/qw9DOLc2O5Dy2Elj/Emx8DZqEGQugRg81O5VwMCkIl0BrzYebjvDst/uICm7Mu1NjiA7x4F2GT+2Dr6fDyd+g5+0w4gVjh2XhcaQgXKTCEhvPLNvHkthjXNulJf+e2JMmfg3MjuUc5WWw6S1Y939GAZj0CVw22uxUwomkINgpp7CUDzYdZsEvyZwuLOXPf+jAI9d28txBR1lJsPQBOLrJWMDkhv/IDMV6QApCLdJyi3h/YzIfbz5CQUkZ11zWkgeGdaB32yCzozmH1hA7H1Y/CV4+cOM8Y19FT+4oFZWkIJzH0cxC5m5I5Iu4FGxl5dzQI4w/DW1Pl9CmZkdzntwTxmjDhB8gehiMewsCw81OJVxICkI1WmueX7Gf9zcm4+Plxc0x4dw3OJq2wf5mR3Ou43Hw6W3GSsijXoE+90iroB6SglDNW+sSePfnZCbGRPDo8E608pSdkS5k71L4+n4ICJGpyvWcFIQqvoxL4dU18dzUqw0vTuju2QOMwOgv2PgarH0WwvsanyIEhJidSphICkKFXxIymPnlbga2D+bFmzx86DGArRiW/w/s+gS63QzjZkODetAaEhckBQFjGfT7P4qjfUgAc6f0xtfHw+fwF2TCkjvg6K8w9G8w5AnpLxCAFARO5RZx94KtNPL1ZsHdfWjqqYOMzkqPh09uhdxUmPA+dL/Z7ETCjdTrgpBfbOPuBdvIOVPKkvsGENbMw1dATvrJWPTU2xfu+la2XBfn8PC28fml5xVz+7ubOXgqj9m39/Ks7dVrErcQPp5gTEy6Z60UA1GjetlCSErP584FW0nPK2buHb09e5ZieRms+V9jTkKHa43l0P08eHCVqJN6VxDijmRxzwexeCnFp/f258pIDx2CDMZ+CF/dCwdXQN/pcP0LskmKuKB69dexcs9JHl68g9BAPxbe3ZeoFh48+jAnBT6ZBGl7YeTL0G+62YmEBdSbgrDwl2Se+XYfPSOa8d7UGIIDPHQfRYDj241hyCUFMPlz6Hit2YmERdSLgrA+Pp2nl+/juq6teGPSlTTy9TY7kvMcXAmf31UxDPlrGYYsLopdnzIopUYopQ4qpRKUUrPOc8xQpdROpdRepdR6x8a8dLaycp6rWNlo9uRenl0M9i+HJbdDy8vgnnVSDMRFq7WFoJTyBmYD1wEpwDal1DKt9b4qxzQD3gZGaK2PKqXcptv+k61HSUjL5x1PH4G492v4Yhq06QV3fClLnIlLYs8rpC+QoLVO0lqXAIuBcdWOmQx8pbU+CqC1TnNszEuTU1jKv9fEMyA6mOFdW5kdx3l++8IoBuF94I6vpBiIS2ZPQWgDHKtyOaXiuqo6AUFKqZ+UUnFKqak13ZFSarpSKlYpFZuenn5piS/Cm+sOkX2mlH/c0MVzJyvtWmJ8tBg5oKJlIGMMxKWzpyDU9ErS1S77AL2B0cD1wJNKqU7n/JDW87TWMVrrmJAQ506zTc4o4INNh5kYE8HlYR76jrljkbFHQtTVcPtn0NCDV34WLmHPpwwpQESVy+FAag3HZGitC4ACpdQG4Aog3iEpL8FL3x/A19uLR4efU5c8Q9wHsPxhaD/MWMeggYfPwxAuYU8LYRvQUSnVTinlC0wCllU75htgkFLKRynVGOgH7HdsVPvtPJbNyr0nmT64PS2beOAc/23vwfKHjKHIkz6VYiAcptYWgtbappR6EFgFeAPztdZ7lVL3V9w+V2u9Xym1EtgNlAPvaa33ODP4hby86gDB/r5MG+SBG65ueQe+fwI6jYRbPwAfDx5gJVzOroFJWusVwIpq182tdvll4GXHRbs0Gw9l8EtCJk+N6UpAQw8bd7VpNqz6G1x2gzFJSXZaFg7mUR/Ma615aeUB2jRrxOR+kWbHcayN/zGKQddxcMtCKQbCKTyqIHy/5yS/Hc/hkes60dDHg0YkbngFfngKuk2ACfPB28NXdRKm8Zg2ta2snFdWH6RjywBuvLL6MAkL++kl+Ol56DERxr0t05eFU3lMC+Gr7cdJSi/gses74+0J+y1qbWyy+tPzxo7L4+dIMRBO5zF/YZ/FHqNTqwDPGKKsNax9Bjb+G3pNhRteBy+Pqd3CjXnEX1labhFxR08zunuYZwxR/vF5oxjETJNiIFzKI1oIq/aeRGsY2b212VHq7tc3YcO/jJbB6FdlvwThUh7x1vP9npNEh/jTsaXFx/LHfQCr/wGX3wg3/EeKgXA5yxeEzPxitiRnMbJba2ufLuz50pib0OE6uHEeeHnQx6bCMixfENbsO0VZuWZkt1Czo1y6+NXw1XRjCvOtH8qgI2EayxeE7/ecJKJ5Iy4Ps+g6AId/gc+mQKvLYfJi8G1sdiJRj1m6ICSk5bPhUDrjrmhjzdOF1B3wyURo1hbu+FpWOhKms3RBeGd9Ir7eXtx1VZTZUS5e2gH46CZoHARTl4J/sNmJhLBuQUjNPsPXO44zqU8ELay2x8Lpw/DReGNOwpSl0DTM5EBCGCw7DuHdn5MAuHdwtMlJLlLeSfhwPJSegbtXQHB7sxMJUcmSBSGroITFW48xtmcY4UEW6oQrzIKPboT8NLhzmdGRKIQbsWRBWPjrYYpsZTww1ELvriWFsOhmyEyE2z+H8BizEwlxDksWhM1JmVwZ0YwOLZuYHcU+WsN3fzH2XJz4MUQPMTuREDWyZKdiWm4Rbax0qrD9A9j1CQyZCV1uMDuNEOdluYKgteZUbjGtm1rkk4XUHbDicWh/DQx5wuw0QlyQ5QpCbpGNM6VltGpqgeXVC7Pgs6ng3xJuelfmJwi3Z7k+hLTcIgBauntBKC83dlXKPQF/XCUDj4QlWK4gnKwoCK3dvSD8/CocWm2saRDe2+w0QtjFcqcMp3KLAWjlzn0Iievgx38aC6PGTDM7jRB2s2BBMFoIbtuHkJNibM0echnc8G9Z5ERYiiULQmCjBvg1cMMOOlsJfHYnlJXCxI/A19/sREJcFOv1IeQUue/pwuq/w/FYY5GTFh3NTiPERbNeCyGv2D1PF3Z/DlvnwYAHje3WhLAgyxWEjLxiQpq4WQshbb+xPXvkQLj2abPTCHHJLFcQMguKCfZ3ozUHi/NgyRTwDYBbFsi+i8LSLNWHUFhio6i0nOb+btJC0Bq+eRCykozpzE08YF8IUa/Z1UJQSo1QSh1USiUopWZd4Lg+SqkypdTNjov4X1kFJQDu00LYPAf2LYVrn4Koq81OI0Sd1VoQlFLewGxgJNAVuE0p1fU8x70ErHJ0yLPOFoTm7lAQjmyCNU/CZTfAwIfMTiOEQ9jTQugLJGitk7TWJcBioKZu9D8DXwJpDsz3O5kVBSHI7IKQnwaf3wXNImH82zL4SHgMewpCG+BYlcspFddVUkq1AW4E5l7ojpRS05VSsUqp2PT09IvNSla+G5wylNngiz9CUQ7c+pEsnS48ij0Foaa3P13t8n+AmVrrsgvdkdZ6ntY6RmsdExISYmfE/0pMz8fHS5k7DuHH/4PDPxvDklt3My+HEE5gz6cMKUBElcvhQGq1Y2KAxRWbpbQARimlbFrrpY4IedbW5Cy6hwfSyNekYcsHvjO2ae99N/S8zZwMQjiRPS2EbUBHpVQ7pZQvMAlYVvUArXU7rXWU1joK+AJ4wNHFoKi0jF0p2fRrZ9K6AllJ8PWfIOxKGPGiORmEcLJaWwhaa5tS6kGMTw+8gfla671Kqfsrbr9gv4GjbD96mtIyTb92zV3xcL9XegaWTAUvL7jlA2jghkOnhXAAuwYmaa1XACuqXVdjIdBa31X3WOfampyFUtA7KsgZd39+Z1dMPrXHWD49qK1rH18IF7LM0OWtyVl0DW1KUz8XDw3e/iHsXGQskNrxOtc+thAuZomCUGIrZ/vR067vPzixq2LF5D8YS6gL4eEsURBO5RZRVFpO59YBrntQW4nRidi4Odz0nqyYLOoFS01u8nLliMCN/4a0vXDbElkxWdQblmghuFzaftjwMnS/BTqPMDuNEC4jBaG68jJjSrNfUxlvIOodS50yuMSWd4x1ESe8D/4tzE4jhEtJC6GqrGRY9xx0GgHdJpidRgiXs0RByC+2ARDQ0IkNGq1h+cOgvGH0azKlWdRLljhlyD1TCkDTRk4clLTjI0heb8xiDGxT+/FCeCBLtBByi4wWgtNGKeaegFX/gLZXQ6+7nPMYQliANQpCRQsh0BkthLNzFcqKYewbxgQmIeopS/z151SeMjjhDCd+JRz8Dob9HYLbO/7+hbAQSxSE3CKjIDi8U1Fr+PF5CGoH/R9w7H0LYUHWKAhnbAQ09MHH28Fx41fCyd0w+HHwtkT/qhBOZY2CUFRKUz8ntA7WvwTN2kKPWx1730JYlCUKQnZhqeM/cjy0BlJ3wODHZPs1ISpYoiCknC6kTbNGjrvDs62DwEjoMclx9yuExbl9QdBacySzkLbB/o6708S1xnyFQY+AjxvsAiWEm3D7gpCeV8yZ0jKiWjR2zB1qDT+9BE3DoeftjrlPITyE2xeE5IwCAKIc1UJI+glStsLV/wM+brKLtBBuwu0LwpHMQsBBBeFs30GTMOg1te73J4SHcfuCkJxZQANvRVgzB+yFcHgjHN0krQMhzsPtC8KRzAIigho7ZlDS+pcgoDX0urPu9yWEB7JAQSgkMtgBHYq5J4xNWvtNl52XhDgPty8IxbZy/H0dMErxeKzxNWpw3e9LCA/l9gXBYVK2gbcvhPYwO4kQbqseFYRYaN1DOhOFuID6URDKbHB8O4T3MTuJEG7N7QtCia0cH+86LniathdsZyA8xjGhhPBQdhUEpdQIpdRBpVSCUmpWDbffrpTaXfHvV6XUFY4KeLqghKDGdZxvkLLN+CotBCEuqNaCoJTyBmYDI4GuwG1Kqa7VDksGhmitewDPAfMcEa7YVkZesY1g/7oWhFjwbwnNIh0RSwiPZU8LoS+QoLVO0lqXAIuBcVUP0Fr/qrU+XXFxMxDuiHCnC4yl05oHOKCFEN5H9loQohb2FIQ2wLEql1MqrjufacD3Nd2glJqulIpVSsWmp6fX+sBZBSUANK/LKUNhFmQmSP+BEHawpyDU9LaqazxQqWEYBWFmTbdrredprWO01jEhISG1PnBlQajLKcPx7cZX6T8Qolb2DAFMASKqXA4HUqsfpJTqAbwHjNRaZzoiXGZBMQDBdTllSNkGygvCrnREJCE8mj0thG1AR6VUO6WULzAJWFb1AKVUJPAVMEVrHe+ocHmO2LEpZRu0vBwaBjgolRCeq9YWgtbappR6EFgFeAPztdZ7lVL3V9w+F/hfIBh4WxkddzatteNO2i+1L/B4nLEgSr/7HRZFCE9m16whrfUKYEW16+ZW+f4e4B7HRqujkgL4ajo0aQ1DHjc7jRCW4Lm7k6x+0vh0YeoyaBRkdhohLMHthy5fkvjVEPs+DHgQooeYnUYIy/C8glCQAd/MMDoS//Ck2WmEsBTPOmXQGpY/DEXZMOVrWRlJiIvkWS2EfUvhwLdGy6B1N7PTCGE5nlUQdnxsbM824EGzkwhhSZ5TEAqzjDEH3W4EL8/5tYRwJc955exfBuU2uPwms5MIYVmeUxD2fAXNoyHUYWuzCFHveEZBOH3Y2HOh2wRZ80CIOrB+QdAavn0EGjSG3neZnUYIS7P+OIRdn0LiOhj1CgQ6ZKEmIeota7cQ8k7Byr9C5ACImWZ2GiEsz9oFYcVjUHoGxr4pHzUK4QDWfRXt+8b4qHHoLGjR0ew0QngEaxaEM6fhu8eMrdkG/tnsNEJ4DGt2Kv74PBRmwh1fgncdllcTQvyONVsICT9A55Gyk7MQDma9gnDmNGQlQZteZicRwuNYryCk7jC+hklBEMLRrFcQzm68IvssCOFw1isIqTugeXto1MzsJEJ4HGsVhOyjkPijMTJRCOFw1ikIWsPy/zFmMw6tcetIIUQdWWccwq7FkLjWmMTULNLsNEJ4JEu0EFRhBqycBRH9ZRKTEE5kiYLg/+vLUJwHY16XSUxCOJHbv7o6qWM02v0h9LkHWl5mdhwhPJrbF4THfT5DN2xqzGoUQjiV2xeEaJVKSeQgaNzc7ChCeDy3Lgj+hcdp73UCW4uuZkcRol6wqyAopUYopQ4qpRKUUue03ZXhjYrbdyulHDLRIDRzMwDFncc64u6EELWotSAopbyB2cBIoCtwm1Kq+lv2SKBjxb/pwBxHhFO6DMDoQxBCOJ09LYS+QILWOklrXQIsBsZVO2Yc8KE2bAaaKaVCHZxVCOFk9hSENsCxKpdTKq672GNQSk1XSsUqpWLT09NrfeDGrdqzPWAwDXwb2RFTCFFX9gxdrmkrJH0Jx6C1ngfMA4iJiTnn9uq6D7kJhshejUK4ij0thBQgosrlcCD1Eo4RQrg5ewrCNqCjUqqdUsoXmAQsq3bMMmBqxacN/YEcrfUJB2cVQjhZracMWmubUupBYBXgDczXWu9VSt1fcftcYAUwCkgACoG7nRdZCOEsdk1/1lqvwHjRV71ubpXvNTDDsdGEEK7m1iMVhRCuJQVBCFFJCoIQopIUBCFEJWX0B5rwwEqlA0fsOLQFkOHkOHUlGevO3fOB+2e0N19brXVITTeYVhDspZSK1VrHmJ3jQiRj3bl7PnD/jI7IJ6cMQohKUhCEEJWsUBDmmR3ADpKx7tw9H7h/xjrnc/s+BCGE61ihhSCEcBEpCEKISm5TEMxayNXBGW+vyLZbKfWrUuoKd8pX5bg+SqkypdTNrsxX8di1ZlRKDVVK7VRK7VVKrXenfEqpQKXUcqXUrop8Lp3Zq5Sar5RKU0rtOc/tdXudaK1N/4cxrToRiAZ8gV1A12rHjAK+x1idqT+wxQ0zDgSCKr4f6cqM9uSrctw6jNmrN7vhc9gM2AdEVlxu6Wb5/ga8VPF9CJAF+Low42CgF7DnPLfX6XXiLi0EKyzkWmtGrfWvWuvTFRc3Y6wc5Tb5KvwZ+BJIc2G2s+zJOBn4Smt9FEBr7cqc9uTTQBOllAICMAqCzVUBtdYbKh7zfOr0OnGXguCwhVyd6GIffxpGpXaVWvMppdoANwJzMYc9z2EnIEgp9ZNSKk4pNdVl6ezL9xbQBWOJwN+Ah7XW5a6JZ5c6vU7sWiDFBRy2kKsT2f34SqlhGAXhaqcmqvawNVxXPd9/gJla6zLjDc7l7MnoA/QGrgEaAZuUUpu11vHODod9+a4HdgJ/ANoDa5RSP2utc52czV51ep24S0GwwkKudj2+UqoH8B4wUmud6aJsYF++GGBxRTFoAYxSStm01ktdktD+/+cMrXUBUKCU2gBcAbiiINiT727gRW2csCcopZKBy4CtLshnj7q9TlzVGVJLR4kPkAS047+dOZdXO2Y0v+8s2eqGGSMx1pUc6I7PYbXjF+L6TkV7nsMuwNqKYxsDe4BubpRvDvB0xfetgONACxc/j1Gcv1OxTq8Tt2ghaAss5Gpnxv8FgoG3K96FbdpFs+PszGcqezJqrfcrpVYCu4Fy4D2tdY0fsZmRD3gOWKiU+g3jRTdTa+2yKdFKqU+BoUALpVQK8BTQoEq+Or1OZOiyEKKSu3zKIIRwA1IQhBCVpCAIISpJQRBCVJKCIISoJAVBCFFJCoIQotL/Aypqjc2o30KJAAAAAElFTkSuQmCC\n", "text/plain": ["
"]}, "metadata": {"needs_background": "light"}, "output_type": "display_data"}], "source": ["import matplotlib.pyplot as plt\n", "fig, ax = plt.subplots(1,1, figsize=(4,4))\n", "ax.plot(fpr1, tpr1, label='tf-idf')\n", "ax.plot(fpr2, tpr2, label='word2vec')\n", "ax.legend();"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Petite analyse d'erreurs\n", "\n", "On combine les erreurs des mod\u00e8les sur la base de test."]}, {"cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [{"data": {"text/html": ["
\n", "\n", "
\n", " \n", "
\n", "
\n", "
sentance
\n", "
source
\n", "
model1
\n", "
model2
\n", "
label
\n", "
\n", " \n", " \n", "
\n", "
850
\n", "
It looks very nice.
\n", "
amazon_cells_labelled
\n", "
0.62
\n", "
0.62
\n", "
1
\n", "
\n", "
\n", "
415
\n", "
As a European, the movie is a nice throwback t...
\n", "
imdb_labelled
\n", "
0.68
\n", "
0.54
\n", "
1
\n", "
\n", "
\n", "
585
\n", "
Great food and great service in a clean and fr...
\n", "
yelp_labelled
\n", "
0.94
\n", "
0.78
\n", "
1
\n", "
\n", "
\n", "
785
\n", "
This allows the possibility of double booking ...
\n", "
amazon_cells_labelled
\n", "
0.48
\n", "
0.54
\n", "
0
\n", "
\n", "
\n", "
440
\n", "
Both do good jobs and are quite amusing.
\n", "
imdb_labelled
\n", "
0.64
\n", "
0.46
\n", "
1
\n", "
\n", " \n", "
\n", "
"], "text/plain": [" sentance source \\\n", "850 It looks very nice. amazon_cells_labelled \n", "415 As a European, the movie is a nice throwback t... imdb_labelled \n", "585 Great food and great service in a clean and fr... yelp_labelled \n", "785 This allows the possibility of double booking ... amazon_cells_labelled \n", "440 Both do good jobs and are quite amusing. imdb_labelled \n", "\n", " model1 model2 label \n", "850 0.62 0.62 1 \n", "415 0.68 0.54 1 \n", "585 0.94 0.78 1 \n", "785 0.48 0.54 0 \n", "440 0.64 0.46 1 "]}, "execution_count": 33, "metadata": {}, "output_type": "execute_result"}], "source": ["final = X_test.copy()\n", "final[\"model1\"] = pmodel1\n", "final[\"model2\"] = pmodel2\n", "final[\"label\"] = y_test\n", "final.head()"]}, {"cell_type": "markdown", "metadata": {}, "source": ["On regarde des erreurs."]}, {"cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [{"data": {"text/html": ["
\n", "\n", "
\n", " \n", "
\n", "
\n", "
sentance
\n", "
source
\n", "
model1
\n", "
model2
\n", "
label
\n", "
\n", " \n", " \n", "
\n", "
527
\n", "
Be sure to order dessert, even if you need to ...
\n", "
yelp_labelled
\n", "
0.54
\n", "
0.20
\n", "
1
\n", "
\n", "
\n", "
707
\n", "
This is cool because most cases are just open ...
\n", "
amazon_cells_labelled
\n", "
0.38
\n", "
0.22
\n", "
1
\n", "
\n", "
\n", "
449
\n", "
I won't say any more - I don't like spoilers, ...
\n", "
imdb_labelled
\n", "
0.20
\n", "
0.22
\n", "
1
\n", "
\n", "
\n", "
676
\n", "
I can't wait to go back.
\n", "
yelp_labelled
\n", "
0.20
\n", "
0.22
\n", "
1
\n", "
\n", "
\n", "
908
\n", "
I can hear while I'm driving in the car, and u...
\n", "
amazon_cells_labelled
\n", "
0.34
\n", "
0.24
\n", "
1
\n", "
\n", " \n", "
\n", "
"], "text/plain": [" sentance source \\\n", "527 Be sure to order dessert, even if you need to ... yelp_labelled \n", "707 This is cool because most cases are just open ... amazon_cells_labelled \n", "449 I won't say any more - I don't like spoilers, ... imdb_labelled \n", "676 I can't wait to go back. yelp_labelled \n", "908 I can hear while I'm driving in the car, and u... amazon_cells_labelled \n", "\n", " model1 model2 label \n", "527 0.54 0.20 1 \n", "707 0.38 0.22 1 \n", "449 0.20 0.22 1 \n", "676 0.20 0.22 1 \n", "908 0.34 0.24 1 "]}, "execution_count": 34, "metadata": {}, "output_type": "execute_result"}], "source": ["erreurs = final[final[\"label\"] == 1].sort_values(\"model2\")\n", "erreurs.head()"]}, {"cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [{"data": {"text/plain": ["['Be sure to order dessert, even if you need to pack it to-go - the tiramisu and cannoli are both to die for.',\n", " 'This is cool because most cases are just open there allowing the screen to get all scratched up.',\n", " \"I won't say any more - I don't like spoilers, so I don't want to be one, but I believe this film is worth your time. \",\n", " \"I can't wait to go back.\",\n", " \"I can hear while I'm driving in the car, and usually don't even have to put it on it's loudest setting.\"]"]}, "execution_count": 35, "metadata": {}, "output_type": "execute_result"}], "source": ["list(erreurs[\"sentance\"])[:5]"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Le mod\u00e8le 2 reconna\u00eet mal les n\u00e9gations visiblement. On regarde le mod\u00e8le 1."]}, {"cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [{"data": {"text/html": ["
\n", "\n", "
\n", " \n", "
\n", "
\n", "
sentance
\n", "
source
\n", "
model1
\n", "
model2
\n", "
label
\n", "
\n", " \n", " \n", "
\n", "
436
\n", "
The soundtrack wasn't terrible, either.
\n", "
imdb_labelled
\n", "
0.06
\n", "
0.34
\n", "
1
\n", "
\n", "
\n", "
412
\n", "
Not too screamy not to masculine but just righ...
\n", "
imdb_labelled
\n", "
0.08
\n", "
0.26
\n", "
1
\n", "
\n", "
\n", "
161
\n", "
I was seated immediately.
\n", "
yelp_labelled
\n", "
0.10
\n", "
0.38
\n", "
1
\n", "
\n", "
\n", "
619
\n", "
Don't miss it.
\n", "
imdb_labelled
\n", "
0.12
\n", "
0.32
\n", "
1
\n", "
\n", "
\n", "
448
\n", "
My 8/10 score is mostly for the plot.
\n", "
imdb_labelled
\n", "
0.14
\n", "
0.48
\n", "
1
\n", "
\n", " \n", "
\n", "
"], "text/plain": [" sentance source model1 \\\n", "436 The soundtrack wasn't terrible, either. imdb_labelled 0.06 \n", "412 Not too screamy not to masculine but just righ... imdb_labelled 0.08 \n", "161 I was seated immediately. yelp_labelled 0.10 \n", "619 Don't miss it. imdb_labelled 0.12 \n", "448 My 8/10 score is mostly for the plot. imdb_labelled 0.14 \n", "\n", " model2 label \n", "436 0.34 1 \n", "412 0.26 1 \n", "161 0.38 1 \n", "619 0.32 1 \n", "448 0.48 1 "]}, "execution_count": 36, "metadata": {}, "output_type": "execute_result"}], "source": ["erreurs = final[final[\"label\"] == 1].sort_values(\"model1\")\n", "erreurs.head()"]}, {"cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [{"data": {"text/plain": ["[\"The soundtrack wasn't terrible, either. \",\n", " 'Not too screamy not to masculine but just right. ',\n", " 'I was seated immediately.',\n", " \"Don't miss it. \",\n", " 'My 8/10 score is mostly for the plot. ']"]}, "execution_count": 37, "metadata": {}, "output_type": "execute_result"}], "source": ["list(erreurs[\"sentance\"])[:5]"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Idem, voyons l\u00e0 o\u00f9 les mod\u00e8les sont en d\u00e9saccords."]}, {"cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [], "source": ["final[\"diff\"] = final.model1 - final.model2"]}, {"cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [{"data": {"text/html": ["
\n", "\n", "
\n", " \n", "
\n", "
\n", "
sentance
\n", "
source
\n", "
model1
\n", "
model2
\n", "
label
\n", "
diff
\n", "
\n", " \n", " \n", "
\n", "
390
\n", "
If you want healthy authentic or ethic food, t...
\n", "
yelp_labelled
\n", "
0.30
\n", "
0.72
\n", "
1
\n", "
-0.42
\n", "
\n", "
\n", "
797
\n", "
A good quality bargain.. I bought this after I...
\n", "
amazon_cells_labelled
\n", "
0.34
\n", "
0.68
\n", "
1
\n", "
-0.34
\n", "
\n", "
\n", "
53
\n", "
This phone is pretty sturdy and I've never had...
\n", "
amazon_cells_labelled
\n", "
0.38
\n", "
0.72
\n", "
1
\n", "
-0.34
\n", "
\n", "
\n", "
691
\n", "
Shot in the Southern California desert using h...
\n", "
imdb_labelled
\n", "
0.36
\n", "
0.70
\n", "
1
\n", "
-0.34
\n", "
\n", "
\n", "
448
\n", "
My 8/10 score is mostly for the plot.
\n", "
imdb_labelled
\n", "
0.14
\n", "
0.48
\n", "
1
\n", "
-0.34
\n", "
\n", " \n", "
\n", "
"], "text/plain": [" sentance source \\\n", "390 If you want healthy authentic or ethic food, t... yelp_labelled \n", "797 A good quality bargain.. I bought this after I... amazon_cells_labelled \n", "53 This phone is pretty sturdy and I've never had... amazon_cells_labelled \n", "691 Shot in the Southern California desert using h... imdb_labelled \n", "448 My 8/10 score is mostly for the plot. imdb_labelled \n", "\n", " model1 model2 label diff \n", "390 0.30 0.72 1 -0.42 \n", "797 0.34 0.68 1 -0.34 \n", "53 0.38 0.72 1 -0.34 \n", "691 0.36 0.70 1 -0.34 \n", "448 0.14 0.48 1 -0.34 "]}, "execution_count": 39, "metadata": {}, "output_type": "execute_result"}], "source": ["erreurs = final[final[\"label\"] == 1].sort_values(\"diff\")\n", "erreurs.head()"]}, {"cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [{"data": {"text/html": ["
\n", "\n", "
\n", " \n", "
\n", "
\n", "
sentance
\n", "
source
\n", "
model1
\n", "
model2
\n", "
label
\n", "
diff
\n", "
\n", " \n", " \n", "
\n", "
464
\n", "
The inside is really quite nice and very clean.
\n", "
yelp_labelled
\n", "
0.94
\n", "
0.46
\n", "
1
\n", "
0.48
\n", "
\n", "
\n", "
4
\n", "
The mic is great.
\n", "
amazon_cells_labelled
\n", "
0.90
\n", "
0.42
\n", "
1
\n", "
0.48
\n", "
\n", "
\n", "
68
\n", "
Great for iPODs too.
\n", "
amazon_cells_labelled
\n", "
0.96
\n", "
0.46
\n", "
1
\n", "
0.50
\n", "
\n", "
\n", "
341
\n", "
It is a really good show to watch.
\n", "
imdb_labelled
\n", "
0.84
\n", "
0.32
\n", "
1
\n", "
0.52
\n", "
\n", "
\n", "
306
\n", "
Has been working great.
\n", "
amazon_cells_labelled
\n", "
0.90
\n", "
0.32
\n", "
1
\n", "
0.58
\n", "
\n", " \n", "
\n", "
"], "text/plain": [" sentance source \\\n", "464 The inside is really quite nice and very clean. yelp_labelled \n", "4 The mic is great. amazon_cells_labelled \n", "68 Great for iPODs too. amazon_cells_labelled \n", "341 It is a really good show to watch. imdb_labelled \n", "306 Has been working great. amazon_cells_labelled \n", "\n", " model1 model2 label diff \n", "464 0.94 0.46 1 0.48 \n", "4 0.90 0.42 1 0.48 \n", "68 0.96 0.46 1 0.50 \n", "341 0.84 0.32 1 0.52 \n", "306 0.90 0.32 1 0.58 "]}, "execution_count": 40, "metadata": {}, "output_type": "execute_result"}], "source": ["erreurs.tail()"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Le mod\u00e8le 2 (word2vec) a l'air meilleur sur les phrases longues, le mod\u00e8le 1 (tf-idf) saisit mieux les mots positifs. A confirmer sur plus de donn\u00e9es. \n", "\n", "* Enlever les stop words, les signes de ponctuation.\n", "* Combiner les deux approches.\n", "* n-grammes\n", "* ...\n", "\n", "Derni\u00e8re analyse en regardant le taux d'erreur par source."]}, {"cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [{"data": {"text/html": ["
\n", "\n", "
\n", " \n", "
\n", "
\n", "
sentance
\n", "
source
\n", "
model1
\n", "
model2
\n", "
label
\n", "
diff
\n", "
rep1
\n", "
rep2
\n", "
err1
\n", "
err2
\n", "
total
\n", "
\n", " \n", " \n", "
\n", "
850
\n", "
It looks very nice.
\n", "
amazon_cells_labelled
\n", "
0.62
\n", "
0.62
\n", "
1
\n", "
0.00
\n", "
1
\n", "
1
\n", "
0
\n", "
0
\n", "
1
\n", "
\n", "
\n", "
415
\n", "
As a European, the movie is a nice throwback t...
\n", "
imdb_labelled
\n", "
0.68
\n", "
0.54
\n", "
1
\n", "
0.14
\n", "
1
\n", "
1
\n", "
0
\n", "
0
\n", "
1
\n", "
\n", "
\n", "
585
\n", "
Great food and great service in a clean and fr...
\n", "
yelp_labelled
\n", "
0.94
\n", "
0.78
\n", "
1
\n", "
0.16
\n", "
1
\n", "
1
\n", "
0
\n", "
0
\n", "
1
\n", "
\n", "
\n", "
785
\n", "
This allows the possibility of double booking ...
\n", "
amazon_cells_labelled
\n", "
0.48
\n", "
0.54
\n", "
0
\n", "
-0.06
\n", "
0
\n", "
1
\n", "
0
\n", "
1
\n", "
1
\n", "
\n", "
\n", "
440
\n", "
Both do good jobs and are quite amusing.
\n", "
imdb_labelled
\n", "
0.64
\n", "
0.46
\n", "
1
\n", "
0.18
\n", "
1
\n", "
0
\n", "
0
\n", "
1
\n", "
1
\n", "
\n", " \n", "
\n", "
"], "text/plain": [" sentance source \\\n", "850 It looks very nice. amazon_cells_labelled \n", "415 As a European, the movie is a nice throwback t... imdb_labelled \n", "585 Great food and great service in a clean and fr... yelp_labelled \n", "785 This allows the possibility of double booking ... amazon_cells_labelled \n", "440 Both do good jobs and are quite amusing. imdb_labelled \n", "\n", " model1 model2 label diff rep1 rep2 err1 err2 total \n", "850 0.62 0.62 1 0.00 1 1 0 0 1 \n", "415 0.68 0.54 1 0.14 1 1 0 0 1 \n", "585 0.94 0.78 1 0.16 1 1 0 0 1 \n", "785 0.48 0.54 0 -0.06 0 1 0 1 1 \n", "440 0.64 0.46 1 0.18 1 0 0 1 1 "]}, "execution_count": 41, "metadata": {}, "output_type": "execute_result"}], "source": ["r1 = rf.predict(np_test)\n", "r2 = rfv.predict(np_test_v)\n", "final[\"rep1\"] = r1\n", "final[\"rep2\"] = r2\n", "final[\"err1\"] = (final.label - final.rep1).abs()\n", "final[\"err2\"] = (final.label - final.rep2).abs()\n", "final[\"total\"] = 1\n", "final.head()"]}, {"cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [{"data": {"text/html": ["
\n", "\n", "
\n", " \n", "
\n", "
\n", "
err1
\n", "
err2
\n", "
total
\n", "
\n", "
\n", "
source
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", "
\n", "
amazon_cells_labelled
\n", "
56
\n", "
94
\n", "
250
\n", "
\n", "
\n", "
imdb_labelled
\n", "
77
\n", "
107
\n", "
253
\n", "
\n", "
\n", "
yelp_labelled
\n", "
52
\n", "
88
\n", "
247
\n", "
\n", " \n", "
\n", "
"], "text/plain": [" err1 err2 total\n", "source \n", "amazon_cells_labelled 56 94 250\n", "imdb_labelled 77 107 253\n", "yelp_labelled 52 88 247"]}, "execution_count": 42, "metadata": {}, "output_type": "execute_result"}], "source": ["final[[\"source\", \"err1\", \"err2\", \"total\"]].groupby(\"source\").sum()"]}, {"cell_type": "markdown", "metadata": {}, "source": ["*imdb* para\u00eet une source une peu plus difficile \u00e0 saisir. Quoiqu'il en soit, 2000 phrases pour apprendre est assez peu pour apprendre."]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Versions utilis\u00e9es pour ce notebook\n", "\n", "[spacy](https://spacy.io/) s'est montr\u00e9 quelque peu fantasques cette ann\u00e9e avec quelques erreurs notamment celle-ci :\n", "[ValueError: cymem.cymem.Pool has the wrong size, try recompiling](https://github.com/explosion/spaCy/issues/2852). Voici les versions utilis\u00e9es..."]}, {"cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [], "source": ["def version(module, sub=True):\n", " try:\n", " ver = getattr(module, '__version__', None)\n", " if ver is None:\n", " ver = [_ for _ in os.listdir(os.path.join(module.__file__, '..', '..' if sub else '')) \\\n", " if module.__name__ in _ and 'dist' in _][-1]\n", " return ver\n", " except Exception as e:\n", " return str(e)"]}, {"cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["thinc 7.4.1\n", "preshed preshed-3.0.2.dist-info\n", "cymem cymem-2.0.2.dist-info\n", "murmurhash murmurhash-1.0.2.dist-info\n", "spacy 2.3.2\n", "msgpack msgpack_numpy-0.4.4.3.dist-info\n", "numpy 1.18.1\n"]}], "source": ["import os\n", "import thinc\n", "print(\"thinc\", version(thinc))\n", "import preshed\n", "print(\"preshed\", version(preshed))\n", "import cymem\n", "print(\"cymem\", version(cymem))\n", "import murmurhash\n", "print(\"murmurhash\", version(murmurhash))\n", "import spacy\n", "print(\"spacy\", spacy.__version__)\n", "\n", "import msgpack\n", "print(\"msgpack\", version(msgpack))\n", "import numpy\n", "print(\"numpy\", numpy.__version__)"]}, {"cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [], "source": []}], "metadata": {"kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.2"}}, "nbformat": 4, "nbformat_minor": 2}