{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# Ideas on City Bike Challenge\n", "\n", "Based on the data available at [Divvy Data](https://www.divvybikes.com/system-data), how to guess where people usually live and where the usually work? This notebook suggests some directions and shows some results."]}, {"cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [{"data": {"text/html": ["
\n", ""], "text/plain": [""]}, "execution_count": 2, "metadata": {}, "output_type": "execute_result"}], "source": ["from jyquickhelper import add_notebook_menu\n", "add_notebook_menu()"]}, {"cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": ["%matplotlib inline"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## The data\n", "\n", "[Divvy Data](https://www.divvybikes.com/system-data) publishes a sample of the data. "]}, {"cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": ["from pyensae.datasource import download_data\n", "file = download_data(\"Divvy_Trips_2016_Q3Q4.zip\", url=\"https://s3.amazonaws.com/divvy-data/tripdata/\")"]}, {"cell_type": "markdown", "metadata": {}, "source": ["We know the stations."]}, {"cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [{"data": {"text/html": ["\n", "\n", "
\n", " \n", " \n", " \n", " id \n", " name \n", " latitude \n", " longitude \n", " dpcapacity \n", " online_date \n", " \n", " \n", " \n", " \n", " 0 \n", " 456 \n", " 2112 W Peterson Ave \n", " 41.991178 \n", " -87.683593 \n", " 15 \n", " 5/12/2015 \n", " \n", " \n", " 1 \n", " 101 \n", " 63rd St Beach \n", " 41.781016 \n", " -87.576120 \n", " 23 \n", " 4/20/2015 \n", " \n", " \n", " 2 \n", " 109 \n", " 900 W Harrison St \n", " 41.874675 \n", " -87.650019 \n", " 19 \n", " 8/6/2013 \n", " \n", " \n", " 3 \n", " 21 \n", " Aberdeen St & Jackson Blvd \n", " 41.877726 \n", " -87.654787 \n", " 15 \n", " 6/21/2013 \n", " \n", " \n", " 4 \n", " 80 \n", " Aberdeen St & Monroe St \n", " 41.880420 \n", " -87.655599 \n", " 19 \n", " 6/26/2013 \n", " \n", " \n", "
\n", "
"], "text/plain": [" id name latitude longitude dpcapacity \\\n", "0 456 2112 W Peterson Ave 41.991178 -87.683593 15 \n", "1 101 63rd St Beach 41.781016 -87.576120 23 \n", "2 109 900 W Harrison St 41.874675 -87.650019 19 \n", "3 21 Aberdeen St & Jackson Blvd 41.877726 -87.654787 15 \n", "4 80 Aberdeen St & Monroe St 41.880420 -87.655599 19 \n", "\n", " online_date \n", "0 5/12/2015 \n", "1 4/20/2015 \n", "2 8/6/2013 \n", "3 6/21/2013 \n", "4 6/26/2013 "]}, "execution_count": 5, "metadata": {}, "output_type": "execute_result"}], "source": ["import pandas\n", "stations = df = pandas.read_csv(\"Divvy_Stations_2016_Q3.csv\")\n", "df.head()"]}, {"cell_type": "markdown", "metadata": {}, "source": ["And we know the trips."]}, {"cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [{"data": {"text/html": ["\n", "\n", "
\n", " \n", " \n", " \n", " trip_id \n", " starttime \n", " stoptime \n", " bikeid \n", " tripduration \n", " from_station_id \n", " from_station_name \n", " to_station_id \n", " to_station_name \n", " usertype \n", " gender \n", " birthyear \n", " \n", " \n", " \n", " \n", " 0 \n", " 12150160 \n", " 9/30/2016 23:59:58 \n", " 10/1/2016 00:04:03 \n", " 4959 \n", " 245 \n", " 69 \n", " Damen Ave & Pierce Ave \n", " 17 \n", " Wood St & Division St \n", " Subscriber \n", " Male \n", " 1988.0 \n", " \n", " \n", " 1 \n", " 12150159 \n", " 9/30/2016 23:59:58 \n", " 10/1/2016 00:04:09 \n", " 2589 \n", " 251 \n", " 383 \n", " Ashland Ave & Harrison St \n", " 320 \n", " Loomis St & Lexington St \n", " Subscriber \n", " Female \n", " 1990.0 \n", " \n", " \n", " 2 \n", " 12150158 \n", " 9/30/2016 23:59:51 \n", " 10/1/2016 00:24:51 \n", " 3656 \n", " 1500 \n", " 302 \n", " Sheffield Ave & Wrightwood Ave \n", " 334 \n", " Lake Shore Dr & Belmont Ave \n", " Customer \n", " NaN \n", " NaN \n", " \n", " \n", " 3 \n", " 12150157 \n", " 9/30/2016 23:59:51 \n", " 10/1/2016 00:03:56 \n", " 3570 \n", " 245 \n", " 475 \n", " Washtenaw Ave & Lawrence Ave \n", " 471 \n", " Francisco Ave & Foster Ave \n", " Subscriber \n", " Female \n", " 1988.0 \n", " \n", " \n", " 4 \n", " 12150156 \n", " 9/30/2016 23:59:32 \n", " 10/1/2016 00:26:50 \n", " 3158 \n", " 1638 \n", " 302 \n", " Sheffield Ave & Wrightwood Ave \n", " 492 \n", " Leavitt St & Addison St \n", " Customer \n", " NaN \n", " NaN \n", " \n", " \n", "
\n", "
"], "text/plain": [" trip_id starttime stoptime bikeid tripduration \\\n", "0 12150160 9/30/2016 23:59:58 10/1/2016 00:04:03 4959 245 \n", "1 12150159 9/30/2016 23:59:58 10/1/2016 00:04:09 2589 251 \n", "2 12150158 9/30/2016 23:59:51 10/1/2016 00:24:51 3656 1500 \n", "3 12150157 9/30/2016 23:59:51 10/1/2016 00:03:56 3570 245 \n", "4 12150156 9/30/2016 23:59:32 10/1/2016 00:26:50 3158 1638 \n", "\n", " from_station_id from_station_name to_station_id \\\n", "0 69 Damen Ave & Pierce Ave 17 \n", "1 383 Ashland Ave & Harrison St 320 \n", "2 302 Sheffield Ave & Wrightwood Ave 334 \n", "3 475 Washtenaw Ave & Lawrence Ave 471 \n", "4 302 Sheffield Ave & Wrightwood Ave 492 \n", "\n", " to_station_name usertype gender birthyear \n", "0 Wood St & Division St Subscriber Male 1988.0 \n", "1 Loomis St & Lexington St Subscriber Female 1990.0 \n", "2 Lake Shore Dr & Belmont Ave Customer NaN NaN \n", "3 Francisco Ave & Foster Ave Subscriber Female 1988.0 \n", "4 Leavitt St & Addison St Customer NaN NaN "]}, "execution_count": 6, "metadata": {}, "output_type": "execute_result"}], "source": ["bikes = df = pandas.read_csv(\"Divvy_Trips_2016_Q3.csv\")\n", "df.head()"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## First assumption\n", "\n", "Which time do you go to work? How do you go to work? Maybe by bicycles, maybe in the morning, everyday... Let's see the distribution of the stop time in every station of one particular day of the week."]}, {"cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": ["import pandas\n", "from datetime import datetime, time\n", "df[\"dtstart\"] = pandas.to_datetime(df.starttime, infer_datetime_format=True)\n", "df[\"dtstop\"] = pandas.to_datetime(df.stoptime, infer_datetime_format=True)\n", "df[\"stopday\"] = df.dtstop.apply(lambda r: datetime(r.year, r.month, r.day))\n", "df[\"stoptime\"] = df.dtstop.apply(lambda r: time(r.hour, r.minute, 0))\n", "df[\"stoptime10\"] = df.dtstop.apply(lambda r: time(r.hour, (r.minute // 10)*10, 0)) # every 10 minutes"]}, {"cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": ["df['stopweekday'] = df['dtstop'].dt.dayofweek"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Big stations\n", "\n", "We average the number of the number of bicycles which stops at the station."]}, {"cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [{"data": {"text/html": ["\n", "\n", "
\n", " \n", " \n", " \n", " to_station_name \n", " to_station_id \n", " mean stops per day \n", " \n", " \n", " \n", " \n", " 0 \n", " 2112 W Peterson Ave \n", " 456 \n", " 2.532468 \n", " \n", " \n", " 1 \n", " 63rd St Beach \n", " 101 \n", " 6.316456 \n", " \n", " \n", " 2 \n", " 900 W Harrison St \n", " 109 \n", " 17.347826 \n", " \n", " \n", " 3 \n", " Aberdeen St & Jackson Blvd \n", " 21 \n", " 36.402174 \n", " \n", " \n", " 4 \n", " Aberdeen St & Monroe St \n", " 80 \n", " 40.793478 \n", " \n", " \n", "
\n", "
"], "text/plain": [" to_station_name to_station_id mean stops per day\n", "0 2112 W Peterson Ave 456 2.532468\n", "1 63rd St Beach 101 6.316456\n", "2 900 W Harrison St 109 17.347826\n", "3 Aberdeen St & Jackson Blvd 21 36.402174\n", "4 Aberdeen St & Monroe St 80 40.793478"]}, "execution_count": 9, "metadata": {}, "output_type": "execute_result"}], "source": ["key = [\"to_station_name\", \"to_station_id\", \"stopday\"]\n", "perday = df[key + [\"trip_id\"]].groupby(key, as_index=False).count()\n", "ave = perday.groupby(key[:-1], as_index=False).mean()\n", "ave.columns = \"to_station_name\", \"to_station_id\", \"mean stops per day\"\n", "ave.head()"]}, {"cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [{"data": {"image/png": "\n", "text/plain": [""]}, "metadata": {}, "output_type": "display_data"}], "source": ["ax = ave[\"mean stops per day\"].hist(bins=50, figsize=(14,4))\n", "ax.set_xlim([0, 300])\n", "ax.set_yscale(\"log\")\n", "ax.set_title(\"mean stops per day\")\n", "ax.set_xlabel(\"mean stops per day\")\n", "ax.set_ylabel(\"number of stations\");"]}, {"cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [{"data": {"text/plain": ["((236, 3), (581, 3))"]}, "execution_count": 11, "metadata": {}, "output_type": "execute_result"}], "source": ["ave[ave[\"mean stops per day\"] > 20].shape, ave.shape"]}, {"cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [{"data": {"text/plain": ["(Timestamp('2016-07-01 00:03:37'), Timestamp('2016-10-01 09:24:02'))"]}, "execution_count": 12, "metadata": {}, "output_type": "execute_result"}], "source": ["df.dtstop.min(), df.dtstop.max()"]}, {"cell_type": "markdown", "metadata": {}, "source": ["For about half of the stations, more than 20 bicycles stops between July and October of 2016. Let's take one of them."]}, {"cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [{"data": {"text/html": ["\n", "\n", "
\n", " \n", " \n", " \n", " to_station_name \n", " to_station_id \n", " mean stops per day \n", " \n", " \n", " \n", " \n", " 3 \n", " Aberdeen St & Jackson Blvd \n", " 21 \n", " 36.402174 \n", " \n", " \n", " 4 \n", " Aberdeen St & Monroe St \n", " 80 \n", " 40.793478 \n", " \n", " \n", " 5 \n", " Ada St & Washington Blvd \n", " 346 \n", " 34.728261 \n", " \n", " \n", " 6 \n", " Adler Planetarium \n", " 341 \n", " 114.478261 \n", " \n", " \n", " 9 \n", " Albany Ave & Bloomingdale Ave \n", " 511 \n", " 20.913043 \n", " \n", " \n", "
\n", "
"], "text/plain": [" to_station_name to_station_id mean stops per day\n", "3 Aberdeen St & Jackson Blvd 21 36.402174\n", "4 Aberdeen St & Monroe St 80 40.793478\n", "5 Ada St & Washington Blvd 346 34.728261\n", "6 Adler Planetarium 341 114.478261\n", "9 Albany Ave & Bloomingdale Ave 511 20.913043"]}, "execution_count": 13, "metadata": {}, "output_type": "execute_result"}], "source": ["ave[ave[\"mean stops per day\"] > 20].head()"]}, {"cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [{"data": {"text/html": ["\n", "\n", "
\n", " \n", " \n", " \n", " to_station_name \n", " to_station_id \n", " stopweekday \n", " stoptime10 \n", " nb_stops \n", " \n", " \n", " \n", " \n", " 925 \n", " Aberdeen St & Jackson Blvd \n", " 21 \n", " 0 \n", " 05:10:00 \n", " 4 \n", " \n", " \n", " 926 \n", " Aberdeen St & Jackson Blvd \n", " 21 \n", " 0 \n", " 06:00:00 \n", " 1 \n", " \n", " \n", " 927 \n", " Aberdeen St & Jackson Blvd \n", " 21 \n", " 0 \n", " 06:10:00 \n", " 4 \n", " \n", " \n", " 928 \n", " Aberdeen St & Jackson Blvd \n", " 21 \n", " 0 \n", " 06:20:00 \n", " 2 \n", " \n", " \n", " 929 \n", " Aberdeen St & Jackson Blvd \n", " 21 \n", " 0 \n", " 06:40:00 \n", " 1 \n", " \n", " \n", "
\n", "
"], "text/plain": [" to_station_name to_station_id stopweekday stoptime10 \\\n", "925 Aberdeen St & Jackson Blvd 21 0 05:10:00 \n", "926 Aberdeen St & Jackson Blvd 21 0 06:00:00 \n", "927 Aberdeen St & Jackson Blvd 21 0 06:10:00 \n", "928 Aberdeen St & Jackson Blvd 21 0 06:20:00 \n", "929 Aberdeen St & Jackson Blvd 21 0 06:40:00 \n", "\n", " nb_stops \n", "925 4 \n", "926 1 \n", "927 4 \n", "928 2 \n", "929 1 "]}, "execution_count": 14, "metadata": {}, "output_type": "execute_result"}], "source": ["key = [\"to_station_name\", \"to_station_id\", \"stopweekday\", \"stoptime10\"]\n", "snippet = df[key + [\"trip_id\"]].groupby(key, as_index=False).count()\n", "snippet.columns = \"to_station_name\", \"to_station_id\", \"stopweekday\", \"stoptime10\", \"nb_stops\"\n", "snippet21 = snippet[snippet[\"to_station_id\"] == 21].copy()\n", "snippet21.head()"]}, {"cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [{"data": {"text/html": ["\n", "\n", "
\n", " \n", " \n", " \n", " to_station_name \n", " to_station_id \n", " stopweekday \n", " stoptime10 \n", " nb_stops \n", " \n", " \n", " \n", " \n", " 1621 \n", " Aberdeen St & Jackson Blvd \n", " 21 \n", " 6 \n", " 22:10:00 \n", " 7 \n", " \n", " \n", " 1622 \n", " Aberdeen St & Jackson Blvd \n", " 21 \n", " 6 \n", " 22:20:00 \n", " 1 \n", " \n", " \n", " 1623 \n", " Aberdeen St & Jackson Blvd \n", " 21 \n", " 6 \n", " 22:30:00 \n", " 2 \n", " \n", " \n", " 1624 \n", " Aberdeen St & Jackson Blvd \n", " 21 \n", " 6 \n", " 22:50:00 \n", " 1 \n", " \n", " \n", " 1625 \n", " Aberdeen St & Jackson Blvd \n", " 21 \n", " 6 \n", " 23:40:00 \n", " 1 \n", " \n", " \n", "
\n", "
"], "text/plain": [" to_station_name to_station_id stopweekday stoptime10 \\\n", "1621 Aberdeen St & Jackson Blvd 21 6 22:10:00 \n", "1622 Aberdeen St & Jackson Blvd 21 6 22:20:00 \n", "1623 Aberdeen St & Jackson Blvd 21 6 22:30:00 \n", "1624 Aberdeen St & Jackson Blvd 21 6 22:50:00 \n", "1625 Aberdeen St & Jackson Blvd 21 6 23:40:00 \n", "\n", " nb_stops \n", "1621 7 \n", "1622 1 \n", "1623 2 \n", "1624 1 \n", "1625 1 "]}, "execution_count": 15, "metadata": {}, "output_type": "execute_result"}], "source": ["snippet21.tail()"]}, {"cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [{"data": {"image/png": "\n", "text/plain": [""]}, "metadata": {}, "output_type": "display_data"}], "source": ["from ensae_projects.datainc.data_bikes import add_missing_time\n", "import matplotlib.pyplot as plt\n", "fig, ax = plt.subplots(1,1, figsize=(14,4))\n", "full_snippet21 = add_missing_time(snippet21, \"stoptime10\", delay=10, values=\"nb_stops\")\n", "full_snippet21[\"rstops\"] = full_snippet21.nb_stops.rolling(7).mean()\n", "full_snippet21.plot(x=\"stoptime10\", y=\"rstops\", figsize=(14,4), kind=\"area\", ax=ax)\n", "ax.set_title(\"station 21\");"]}, {"cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [{"data": {"image/png": "\n", "text/plain": [""]}, "metadata": {}, "output_type": "display_data"}], "source": ["from ensae_projects.datainc.data_bikes import add_missing_time\n", "import matplotlib.pyplot as plt\n", "fig, ax = plt.subplots(1,1, figsize=(14,4))\n", "sni = snippet[snippet[\"to_station_id\"] == 341].copy()\n", "sni = add_missing_time(sni, \"stoptime10\", delay=10, values=\"nb_stops\")\n", "sni[\"rstops\"] = sni.nb_stops.rolling(7).mean()\n", "sni.plot(x=\"stoptime10\", y=\"rstops\", figsize=(14,4), kind=\"area\", ax=ax)\n", "ax.set_title(\"station 341\");"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Indicator\n", "\n", "If a station is inside working area, people will arrive in the morning and will leave in the evening. If it is a living area, people should leave in the morning and arrive in the evening. Let's note $L_t$ the number of bicyles leaving a station. Let's compute the ratio:\n", "\n", "$$R = \\frac{\\sum_{t=8am}^{12am} L_t}{\\sum_{t=0am}^{12pm} L_t}$$"]}, {"cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [{"data": {"text/plain": ["0.1803523439832786"]}, "execution_count": 18, "metadata": {}, "output_type": "execute_result"}], "source": ["col = full_snippet21[\"stoptime10\"]\n", "R21 = full_snippet21.nb_stops[(col > time(8,0,0)) & (col < time(12,0,0))].sum() / \\\n", " full_snippet21.nb_stops[col >= time(0,0,0)].sum()\n", "R21"]}, {"cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": ["key = [\"to_station_name\", \"to_station_id\", \"stopweekday\", \"stoptime10\"]\n", "agg = bikes[key + [\"trip_id\"]].groupby(key, as_index=False).count().copy()\n", "\n", "ratios = {}\n", "for ids in set(stations.id):\n", " sni = agg[agg[\"to_station_id\"] == ids]\n", " sni.columns = \"to_station_name\", \"to_station_id\", \"stopweekday\", \"stoptime10\", \"nb_stops\"\n", " num = sni.nb_stops[(sni[\"stoptime10\"] >= time(8,0,0)) & (sni[\"stoptime10\"] <= time(12,0,0))].sum()\n", " den = sni.nb_stops[sni[\"stoptime10\"] >= time(0,0,0)].sum()\n", " if den > 0:\n", " ratios[ids] = num * 1.0 / den"]}, {"cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [{"data": {"text/html": ["\n", "\n", "
\n", " \n", " \n", " \n", " id \n", " name \n", " latitude \n", " longitude \n", " dpcapacity \n", " online_date \n", " ratio \n", " \n", " \n", " \n", " \n", " 0 \n", " 456 \n", " 2112 W Peterson Ave \n", " 41.991178 \n", " -87.683593 \n", " 15 \n", " 5/12/2015 \n", " 0.102564 \n", " \n", " \n", " 1 \n", " 101 \n", " 63rd St Beach \n", " 41.781016 \n", " -87.576120 \n", " 23 \n", " 4/20/2015 \n", " 0.254509 \n", " \n", " \n", " 2 \n", " 109 \n", " 900 W Harrison St \n", " 41.874675 \n", " -87.650019 \n", " 19 \n", " 8/6/2013 \n", " 0.392231 \n", " \n", " \n", " 3 \n", " 21 \n", " Aberdeen St & Jackson Blvd \n", " 41.877726 \n", " -87.654787 \n", " 15 \n", " 6/21/2013 \n", " 0.196178 \n", " \n", " \n", " 4 \n", " 80 \n", " Aberdeen St & Monroe St \n", " 41.880420 \n", " -87.655599 \n", " 19 \n", " 6/26/2013 \n", " 0.192379 \n", " \n", " \n", "
\n", "
"], "text/plain": [" id name latitude longitude dpcapacity \\\n", "0 456 2112 W Peterson Ave 41.991178 -87.683593 15 \n", "1 101 63rd St Beach 41.781016 -87.576120 23 \n", "2 109 900 W Harrison St 41.874675 -87.650019 19 \n", "3 21 Aberdeen St & Jackson Blvd 41.877726 -87.654787 15 \n", "4 80 Aberdeen St & Monroe St 41.880420 -87.655599 19 \n", "\n", " online_date ratio \n", "0 5/12/2015 0.102564 \n", "1 4/20/2015 0.254509 \n", "2 8/6/2013 0.392231 \n", "3 6/21/2013 0.196178 \n", "4 6/26/2013 0.192379 "]}, "execution_count": 20, "metadata": {}, "output_type": "execute_result"}], "source": ["import numpy\n", "stations_ratio = stations.copy()\n", "stations_ratio[\"ratio\"] = stations.id.apply(lambda x: ratios.get(x, numpy.nan))\n", "stations_ratio.head()"]}, {"cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [{"data": {"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAD8CAYAAABn919SAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAENpJREFUeJzt3X+MZXdZx/H3Q0uldKBbKB032+pALISmGyB709Q06gwLBqlp+0chkGK2umESVCRho6yYGH8mi6YiURLZtMBgCtOK4G4KSpqyV9DQyq4tLG1tWsqk9IddoLsrgwiuPP4xp3Xs3Lvn3Dv313zn/Uo2c8+Z77n32WdmP/ud75xzbmQmkqSN7znjLkCSNBgGuiQVwkCXpEIY6JJUCANdkgphoEtSIQx0SSqEgS5JhTDQJakQZzYZFBFbgBuBS4EEfgV4ALgFmAGWgDdn5vHTPc/555+fMzMzfRX6ve99j3POOaevYzcLe1TPHtWzR/VG3aMjR458OzNfUjcumlz6HxELwBcz88aIOAt4PvBe4KnM3BcRe4HzMvM9p3ueVquVhw8fbvY3eJZ2u83s7Gxfx24W9qiePapnj+qNukcRcSQzW3XjapdcIuKFwM8CNwFk5g8z8wRwNbBQDVsArum/XEnSejVZQ38Z8C3gIxFxd0TcGBHnANOZ+QRA9fGCIdYpSapRu+QSES3gTuCKzLwrIj4A/Afwzszcsmrc8cw8r8Px88A8wPT09I7FxcW+Cl1eXmZqaqqvYzcLe1TPHtWzR/VG3aO5ublGSy5k5mn/AD8OLK3a/hngM6z8UnRrtW8r8EDdc+3YsSP7dejQob6P3SzsUT17VM8e1Rt1j4DDWZOvmVm/5JKZ/w58MyJeUe3aCdwHHAR2Vft2AQea/38jSRq0RqctAu8Ebq7OcHkY+GVW1t9vjYjdwCPAm4ZToiSpiUaBnpn3AJ3Wb3YOthxJUr+8UlSSCmGgS1Ihmq6hawBm9n6m4/6lfVeOuBJJJXKGLkmFMNAlqRAGuiQVwkCXpEIY6JJUCANdkgphoEtSIQx0SSqEgS5JhTDQJakQBrokFcJAl6RCGOiSVAjvtjgE3e6qKEnD5AxdkgphoEtSIQx0SSqEa+gTzHc4ktQLZ+iSVAgDXZIKYaBLUiEMdEkqhIEuSYUw0CWpEI1OW4yIJeC7wP8ApzKzFREvAm4BZoAl4M2ZeXw4ZZbNWwVIGoReZuhzmfnqzGxV23uBOzLzYuCOaluSNCbrWXK5GlioHi8A16y/HElSvyIz6wdFfAM4DiTwoczcHxEnMnPLqjHHM/O8DsfOA/MA09PTOxYXF/sqdHl5mampqb6OHbWjj50c6vNv33Zux/0bqUfjYo/q2aN6o+7R3NzckVWrI101vfT/isx8PCIuAG6PiH9rWkhm7gf2A7RarZydnW166P/Tbrfp99hRu37Ia+JL18123L+RejQu9qiePao3qT1qtOSSmY9XH48BnwYuA56MiK0A1cdjwypSklSvNtAj4pyIeMHTj4GfB74GHAR2VcN2AQeGVaQkqV6TJZdp4NMR8fT4j2fmP0TEl4FbI2I38AjwpuGVKUmqUxvomfkw8KoO+78D7BxGUZKk3nmlqCQVwkCXpEIY6JJUCANdkgphoEtSIQx0SSqEgS5JhTDQJakQBrokFaLp3RY3rdO9m9DSvitHWIkknZ4zdEkqhIEuSYUw0CWpEAa6JBXCQJekQhjoklQIA12SCmGgS1IhDHRJKoRXiq7D6a4ilaRRc4YuSYUw0CWpEAa6JBXCQJekQhjoklQIA12SCtE40CPijIi4OyJuq7ZfGhF3RcSDEXFLRJw1vDIlSXV6maG/C7h/1fb7gPdn5sXAcWD3IAuTJPWmUaBHxIXAlcCN1XYArwU+WQ1ZAK4ZRoGSpGaaztD/HPgt4EfV9ouBE5l5qtp+FNg24NokST2ovfQ/In4ROJaZRyJi9undHYZml+PngXmA6elp2u12X4UuLy/3fex67Nl+qn7QiHXrw7h6tJHYo3r2qN6k9qjJvVyuAK6KiDcCzwNeyMqMfUtEnFnN0i8EHu90cGbuB/YDtFqtnJ2d7avQdrtNv8eux/UTeL+WpetmO+4fV482EntUzx7Vm9Qe1S65ZOZvZ+aFmTkDvAX4fGZeBxwCrq2G7QIODK1KSVKt9ZyH/h7g3RHxECtr6jcNpiRJUj96un1uZraBdvX4YeCywZckSeqHV4pKUiEMdEkqhIEuSYUw0CWpEAa6JBXCQJekQhjoklQIA12SCmGgS1IhDHRJKoSBLkmFMNAlqRAGuiQVoqe7LWoyzHR504092091fEOOpX1XDrskSRPAGbokFcJAl6RCGOiSVAgDXZIKYaBLUiEMdEkqhIEuSYUw0CWpEAa6JBXCQJekQhjoklQIA12SCmGgS1IhagM9Ip4XEf8SEV+JiHsj4ver/S+NiLsi4sGIuCUizhp+uZKkbprM0H8AvDYzXwW8GnhDRFwOvA94f2ZeDBwHdg+vTElSndpAzxXL1eZzqz8JvBb4ZLV/AbhmKBVKkhpptIYeEWdExD3AMeB24OvAicw8VQ15FNg2nBIlSU1EZjYfHLEF+DTwu8BHMvOnqv0XAZ/NzO0djpkH5gGmp6d3LC4u9lXo8vIyU1NTfR27HkcfOzny1+zX9Nnw5PfX7t++7dzRFzOhxvV9tJHYo3qj7tHc3NyRzGzVjevpLegy80REtIHLgS0RcWY1S78QeLzLMfuB/QCtVitnZ2d7eclntNtt+j12PTq9pduk2rP9FDccXfslXbpudvTFTKhxfR9tJPao3qT2qMlZLi+pZuZExNnA64D7gUPAtdWwXcCBYRUpSarXZIa+FViIiDNY+Q/g1sy8LSLuAxYj4o+Au4GbhlinJKlGbaBn5leB13TY/zBw2TCKkiT1zitFJakQBrokFaKns1xUlpkuZ/As7btyxJVIGgRn6JJUCANdkgphoEtSIQx0SSqEgS5JhTDQJakQBrokFcJAl6RCGOiSVAgDXZIKYaBLUiEMdEkqhIEuSYUw0CWpEAa6JBXCQJekQhjoklQIA12SCmGgS1IhDHRJKoSBLkmFMNAlqRAGuiQVwkCXpELUBnpEXBQRhyLi/oi4NyLeVe1/UUTcHhEPVh/PG365kqRumszQTwF7MvOVwOXAr0XEJcBe4I7MvBi4o9qWJI1JbaBn5hOZ+a/V4+8C9wPbgKuBhWrYAnDNsIqUJNWLzGw+OGIG+AJwKfBIZm5Z9bnjmblm2SUi5oF5gOnp6R2Li4t9Fbq8vMzU1FRfx67H0cdOjvw1+zV9Njz5/bX7t287t+P4bn+3buNLMK7vo43EHtUbdY/m5uaOZGarblzjQI+IKeAfgT/OzE9FxIkmgb5aq9XKw4cPN3q9Z2u328zOzvZ17HrM7P3MyF+zX3u2n+KGo2eu2b+078qO47v93bqNL8G4vo82EntUb9Q9iohGgd7oLJeIeC7wt8DNmfmpaveTEbG1+vxW4Fi/xUqS1q/JWS4B3ATcn5l/tupTB4Fd1eNdwIHBlydJamrtz+drXQH8EnA0Iu6p9r0X2AfcGhG7gUeANw2nxNHYSEsrktRJbaBn5j8B0eXTOwdbjiSpX14pKkmFMNAlqRAGuiQVwkCXpEIY6JJUiCanLWqDG9Qpmad7npKvLpU2CmfoklQIA12SCmGgS1IhXEPXGt4GQdqYnKFLUiEMdEkqhIEuSYUw0CWpEAa6JBXCQJekQnjaooZqM74RtTQuztAlqRAGuiQVwkCXpEIY6JJUCANdkgphoEtSITxtUQPhHRql8XOGLkmFMNAlqRC1gR4RH46IYxHxtVX7XhQRt0fEg9XH84ZbpiSpTpM19I8Cfwl8bNW+vcAdmbkvIvZW2+8ZfHn/5+hjJ7m+wzqtl5CXxVsFSP2rnaFn5heAp561+2pgoXq8AFwz4LokST3qdw19OjOfAKg+XjC4kiRJ/YjMrB8UMQPclpmXVtsnMnPLqs8fz8yO6+gRMQ/MA0xPT+9YXFzsq9BjT53kye+v3b9927k9Pc/Rx0729fobwfTZdOzRJOr2dev29en169zN8vIyU1NTA3muUtmjeqPu0dzc3JHMbNWN6/c89CcjYmtmPhERW4Fj3QZm5n5gP0Cr1crZ2dm+XvAvbj7ADUfXlrt0XW/P12kdvhR7tp/q2KNJ1O3r1u3r0+vXuZt2u02/34ObhT2qN6k96nfJ5SCwq3q8CzgwmHIkSf1qctriJ4AvAa+IiEcjYjewD3h9RDwIvL7aliSNUe3P55n51i6f2jngWiRJ6+CVopJUCANdkgqxMU6JUHG8O6M0eM7QJakQBrokFcJAl6RCGOiSVAgDXZIKYaBLUiE8bVEbgm98IdVzhi5JhTDQJakQBrokFcI1dKkPrulrEjlDl6RCGOiSVAgDXZIKUewaurdn3Rx6Xcs++tjJnt4o3DVxbSTO0CWpEAa6JBWi2CUXaRB6Xbo73fhuyzeeAqlBcYYuSYUw0CWpEAa6JBXCNXQVqdu69J7tIy5kFU+l1bA5Q5ekQhjoklSIdS25RMQbgA8AZwA3Zua+gVTVA3+MVak8nXHjGffXrO8ZekScAXwQ+AXgEuCtEXHJoAqTJPVmPUsulwEPZebDmflDYBG4ejBlSZJ6tZ5A3wZ8c9X2o9U+SdIYrGcNPTrsyzWDIuaB+WpzOSIe6PP1zge+3eexm8Jv2KNaJfQo3jf0l9jwPRqBnno0gK/ZTzYZtJ5AfxS4aNX2hcDjzx6UmfuB/et4HQAi4nBmttb7PCWzR/XsUT17VG9Se7SeJZcvAxdHxEsj4izgLcDBwZQlSepV3zP0zDwVEb8OfI6V0xY/nJn3DqwySVJP1nUeemZ+FvjsgGqps+5lm03AHtWzR/XsUb2J7FFkrvk9piRpA/LSf0kqxMQFekS8ISIeiIiHImJvh8//WETcUn3+roiYGX2V49WgR++OiPsi4qsRcUdENDrlqSR1PVo17tqIyIiYuDMWhq1JjyLizdX30r0R8fFR1zhuDf6t/UREHIqIu6t/b28cR53PyMyJ+cPKL1e/DrwMOAv4CnDJs8b8KvBX1eO3ALeMu+4J7NEc8Pzq8Tvs0doeVeNeAHwBuBNojbvuSesRcDFwN3BetX3BuOuewB7tB95RPb4EWBpnzZM2Q29yO4GrgYXq8SeBnRHR6SKnUtX2KDMPZeZ/Vpt3snKNwGbS9LYUfwj8CfBfoyxuQjTp0duBD2bmcYDMPDbiGsetSY8SeGH1+Fw6XIszSpMW6E1uJ/DMmMw8BZwEXjyS6iZDr7dc2A38/VArmjy1PYqI1wAXZeZtoyxsgjT5Pno58PKI+OeIuLO6u+pm0qRHvwe8LSIeZeWMv3eOprTOJu0di5rcTqDRLQcK1vjvHxFvA1rAzw21oslz2h5FxHOA9wPXj6qgCdTk++hMVpZdZln5Ke+LEXFpZp4Ycm2TokmP3gp8NDNviIifBv666tGPhl/eWpM2Q29yO4FnxkTEmaz8mPPUSKqbDI1uuRARrwN+B7gqM38wotomRV2PXgBcCrQjYgm4HDi4yX4x2vTf2oHM/O/M/AbwACsBv1k06dFu4FaAzPwS8DxW7vMyFpMW6E1uJ3AQ2FU9vhb4fFa/kdgkantULSd8iJUw32zrnlDTo8w8mZnnZ+ZMZs6w8nuGqzLz8HjKHYsm/9b+jpVfsBMR57OyBPPwSKscryY9egTYCRARr2Ql0L810ipXmahAr9bEn76dwP3ArZl5b0T8QURcVQ27CXhxRDwEvBvoekpaiRr26E+BKeBvIuKeiNhU99hp2KNNrWGPPgd8JyLuAw4Bv5mZ3xlPxaPXsEd7gLdHxFeATwDXj3OC6ZWiklSIiZqhS5L6Z6BLUiEMdEkqhIEuSYUw0CWpEAa6JBXCQJekQhjoklSI/wXNSEQdgOVPGQAAAABJRU5ErkJggg==\n", "text/plain": [""]}, "metadata": {}, "output_type": "display_data"}], "source": ["stations_ratio[\"ratio\"].hist(bins=50);"]}, {"cell_type": "markdown", "metadata": {}, "source": ["We choose the median."]}, {"cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [{"data": {"text/plain": ["0.1648655139289145"]}, "execution_count": 22, "metadata": {}, "output_type": "execute_result"}], "source": ["stations_ratio.ratio.median()"]}, {"cell_type": "markdown", "metadata": {}, "source": ["And we draw a map."]}, {"cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [{"data": {"text/html": [""], "text/plain": [".CustomFoliumMap at 0x1fa01cb0e48>"]}, "execution_count": 23, "metadata": {}, "output_type": "execute_result"}], "source": ["from ensae_projects.datainc.data_bikes import folium_html_stations_map\n", "xy = []\n", "for els in stations_ratio.apply(lambda row: (row[\"latitude\"], row[\"longitude\"], row[\"ratio\"], row[\"name\"]), axis=1):\n", " name = \"%s %1.2f\" % (els[3], els[2])\n", " color = \"red\" if els[2] >= 0.1648655139289145 else \"blue\"\n", " xy.append( ( (els[0], els[1]), (name, color)))\n", "folium_html_stations_map(xy, width=\"80%\")"]}, {"cell_type": "markdown", "metadata": {}, "source": ["If we assume that people have bigger flats than working space, we should assume there are more living areas than working areas."]}, {"cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [{"data": {"text/plain": ["0.18241042345276873"]}, "execution_count": 24, "metadata": {}, "output_type": "execute_result"}], "source": ["stations_ratio.ratio.quantile(0.6)"]}, {"cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [{"data": {"text/html": [""], "text/plain": [".CustomFoliumMap at 0x1fa02425358>"]}, "execution_count": 25, "metadata": {}, "output_type": "execute_result"}], "source": ["from ensae_projects.datainc.data_bikes import folium_html_stations_map\n", "xy = []\n", "for els in stations_ratio.apply(lambda row: (row[\"latitude\"], row[\"longitude\"], row[\"ratio\"], row[\"name\"]), axis=1):\n", " name = \"%s %1.2f\" % (els[3], els[2])\n", " color = \"red\" if els[2] >= 0.18241042345276873 else \"blue\"\n", " xy.append( ( (els[0], els[1]), (name, color)))\n", "folium_html_stations_map(xy, width=\"80%\")"]}, {"cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": []}], "metadata": {"kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.0"}}, "nbformat": 4, "nbformat_minor": 2}