City Bike Challenge#
Links: notebook
, html, PDF
, python
, slides, GitHub
Based on the data available at Divvy Data, how to guess where people usually live and where the usually work?
from jyquickhelper import add_notebook_menu
add_notebook_menu()
The city#
I don’t know Chicago. Assuming I’m looking for a restaurant or a bar, where should I go? Let’s try to find where I should go to walk in lively places and find a bar…
from pyquickhelper.helpgen import NbImage
NbImage("images/chicago.png")
The data#
Divvy Data publishes a sample of the data.
from pyensae.datasource import download_data
file = download_data("Divvy_Trips_2016_Q3Q4.zip", url="https://s3.amazonaws.com/divvy-data/tripdata/")
We know the stations.
import pandas
stations = df = pandas.read_csv("Divvy_Stations_2016_Q3.csv")
df.head()
id | name | latitude | longitude | dpcapacity | online_date | |
---|---|---|---|---|---|---|
0 | 456 | 2112 W Peterson Ave | 41.991178 | -87.683593 | 15 | 5/12/2015 |
1 | 101 | 63rd St Beach | 41.781016 | -87.576120 | 23 | 4/20/2015 |
2 | 109 | 900 W Harrison St | 41.874675 | -87.650019 | 19 | 8/6/2013 |
3 | 21 | Aberdeen St & Jackson Blvd | 41.877726 | -87.654787 | 15 | 6/21/2013 |
4 | 80 | Aberdeen St & Monroe St | 41.880420 | -87.655599 | 19 | 6/26/2013 |
And we know the trips.
bikes = df = pandas.read_csv("Divvy_Trips_2016_Q3.csv")
df.head()
trip_id | starttime | stoptime | bikeid | tripduration | from_station_id | from_station_name | to_station_id | to_station_name | usertype | gender | birthyear | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 12150160 | 9/30/2016 23:59:58 | 10/1/2016 00:04:03 | 4959 | 245 | 69 | Damen Ave & Pierce Ave | 17 | Wood St & Division St | Subscriber | Male | 1988.0 |
1 | 12150159 | 9/30/2016 23:59:58 | 10/1/2016 00:04:09 | 2589 | 251 | 383 | Ashland Ave & Harrison St | 320 | Loomis St & Lexington St | Subscriber | Female | 1990.0 |
2 | 12150158 | 9/30/2016 23:59:51 | 10/1/2016 00:24:51 | 3656 | 1500 | 302 | Sheffield Ave & Wrightwood Ave | 334 | Lake Shore Dr & Belmont Ave | Customer | NaN | NaN |
3 | 12150157 | 9/30/2016 23:59:51 | 10/1/2016 00:03:56 | 3570 | 245 | 475 | Washtenaw Ave & Lawrence Ave | 471 | Francisco Ave & Foster Ave | Subscriber | Female | 1988.0 |
4 | 12150156 | 9/30/2016 23:59:32 | 10/1/2016 00:26:50 | 3158 | 1638 | 302 | Sheffield Ave & Wrightwood Ave | 492 | Leavitt St & Addison St | Customer | NaN | NaN |
The challenge#
We know how people use bicycles. People, people… it is us. What do I know about myself I could use to explore the data and determines living and working areas of Chicago?
A few graph#
Display the city with two colors. The following shows the stations with more than 20 slots.
from ensae_projects.datainc.data_bikes import folium_html_stations_map
xy = []
for els in stations.apply(lambda row: (row["latitude"], row["longitude"], row["dpcapacity"] >= 20), axis=1):
xy.append( ( (els[0], els[1]), "red" if els[2] else "blue"))
folium_html_stations_map(xy, width="80%")