{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# Big Data, Azure, Machine Learning, Python\n", "\n", "Pr\u00e9sentation \u00e0 Centrale Paris - Juin 2016."]}, {"cell_type": "markdown", "metadata": {}, "source": ["\n", "[Xavier Dupr\u00e9](http://www.xavierdupre.fr/)\n", "\n", "``xavier.dupre AT gmail.com``\n", "\n", "Senior Engineer at **Microsoft France** on [Azure ML](https://azure.microsoft.com/fr-fr/services/machine-learning/), **Teacher in Computer Science** at the [ENSAE](http://www.ensae.fr/)\n", "\n", "![Azure ML](logo_azureml.png) ![ENSAE](ENSAE_logo_developpe.jpg)"]}, {"cell_type": "code", "execution_count": 1, "metadata": {"collapsed": false, "slideshow": {"slide_type": "subslide"}}, "outputs": [{"data": {"text/html": ["
run previous cell, wait for 2 seconds
\n", ""], "text/plain": [""]}, "execution_count": 2, "metadata": {}, "output_type": "execute_result"}], "source": ["from jyquickhelper import add_notebook_menu\n", "add_notebook_menu()"]}, {"cell_type": "markdown", "metadata": {"collapsed": true, "slideshow": {"slide_type": "slide"}}, "source": ["## Introduction\n", "\n", "**Exp\u00e9rience**\n", "\n", "* Microsoft, Bing, Azure\n", "* Enseignements, ENSAE, Formation professionnelle (Institut des actuaires) (+ 200 \u00e9l\u00e8ves, + 60h de cours)\n", "\n", "**Expertise**\n", "\n", "* Machine Learning (PhD)\n", "* Python\n", "* Map Reduce\n", "* Azure"]}, {"cell_type": "markdown", "metadata": {"slideshow": {"slide_type": "subslide"}}, "source": ["**Microsoft in universities**\n", "\n", "* [Microsoft, partenaire de la fili\u00e8re Data Science de l\u2019ENSAE ParisTech avec Microsoft Azure Machine Learning](http://news.microsoft.com/fr-fr/2014/09/23/microsoft-partenaire-de-la-filiere-data-science-de-lensae-paristech-avec-microsoft-azure-machine-learning/#sm.00003ys2fe13deeauvpdv9gdiv2oi) (2014)\n", "* [Developing the Next Wave of Data Scientists](https://blogs.technet.microsoft.com/machinelearning/2016/06/14/developing-the-next-wave-of-data-scientists/) (2015-2016)\n", "* Microsoft is one of the sponsors of the [DataScienceGame](http://www.datasciencegame.com/sponsors) (2016)\n", "\n", "[![Microsoft - ENSAE - Hackathon](img_hack.png)](https://www.youtube.com/embed/Y1UKAbbExn8)"]}, {"cell_type": "markdown", "metadata": {"slideshow": {"slide_type": "subslide"}}, "source": ["### Que feriez-vous si ?\n", "\n", "* Statistiques descriptives sur un fichier de 1GO, 10Go, 100Go ?\n", "* Apprendre une r\u00e9gression logisitique sur ... ?\n", "* Apprendre une for\u00eat d'arbre sur ... ?\n", "* Et si vous ceviez le faire toutes les semaines ?\n", "* Comment repr\u00e9senter 10M de points sur une carte ?\n", "\n", "Retour des \u00e9tudiants \u00e0 propos du hackathon : \n", "\n", "*C'\u00e9tait bien de se confronter \u00e0 des donn\u00e9es pas tr\u00e8s propres.*"]}, {"cell_type": "markdown", "metadata": {"collapsed": true, "slideshow": {"slide_type": "slide"}}, "source": ["### D\u00e9marrage d'un cluster Hadoop sur Azure"]}, {"cell_type": "markdown", "metadata": {"collapsed": true, "slideshow": {"slide_type": "slide"}}, "source": ["## Cas concret 1 : syst\u00e8me de recommandation, moteur de recherches\n", "\n", "* Syst\u00e8me de recommandation, application chez Bing aux recherches associ\u00e9es\n", "* Quelques approches th\u00e9oriques\n", "* La mise en pratique avec aux gros volumes de donn\u00e9es\n", "* Optimisation sur Internet, apprentissage par renforcement"]}, {"cell_type": "markdown", "metadata": {"collapsed": true, "slideshow": {"slide_type": "slide"}}, "source": ["## Outils Big Data chez Microsoft / Azure\n", "\n", "* Cluster Map/Reduce, Azure ML, Machine Virtuelle, Power BI\n", "* Trois exemples d'utilisation (Machine Virtual, Azure ML, Cluster Hadoop) sur des jeux de donn\u00e9es acad\u00e9miques\n", "* Quelques exercices"]}, {"cell_type": "markdown", "metadata": {"collapsed": true, "slideshow": {"slide_type": "slide"}}, "source": ["## Cas concret 2 : suggestions sur Internet\n", "\n", "* R\u00e9flexions autour de l'impact d'un syst\u00e8me de suggestions\n", "\n", " * Sur les utilisateurs\n", " * Sur les logs\n", " \n", "* M\u00e9triques : comment mesurer l'impact ?\n", "\n", "![suggestion](img_bing.png)"]}, {"cell_type": "code", "execution_count": 2, "metadata": {"collapsed": true}, "outputs": [], "source": []}], "metadata": {"celltoolbar": "Slideshow", "kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2"}}, "nbformat": 4, "nbformat_minor": 2}