{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# 2A.i - Mod\u00e8le relationnel, analyse d'incidents dans le transport a\u00e9rien\n", "\n", "Base de donn\u00e9es relationnelles, logique SQL."]}, {"cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [{"data": {"text/html": ["
run previous cell, wait for 2 seconds
\n", ""], "text/plain": [""]}, "execution_count": 2, "metadata": {}, "output_type": "execute_result"}], "source": ["from jyquickhelper import add_notebook_menu\n", "add_notebook_menu()"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Donn\u00e9es\n", "\n", "Le code suivant t\u00e9l\u00e9charge les donn\u00e9es n\u00e9cessaires [tp_2a_5_compagnies.zip](http://www.xavierdupre.fr/enseignement/complements/tp_2a_5_compagnies.zip)."]}, {"cell_type": "code", "execution_count": 2, "metadata": {"scrolled": false}, "outputs": [{"name": "stdout", "output_type": "stream", "text": [" downloading of http://www.xavierdupre.fr/enseignement/complements/tp_2a_5_compagnies.zip to tp_2a_5_compagnies.zip\n", " unzipped Cities.csv to .\\Cities.csv\n", " unzipped Compagnies.csv to .\\Compagnies.csv\n", " unzipped Crews.csv to .\\Crews.csv\n", " unzipped Crews_planes_habilitation.csv to .\\Crews_planes_habilitation.csv\n", " unzipped Data_exemple.csv to .\\Data_exemple.csv\n", " unzipped Data_exemple_2.csv to .\\Data_exemple_2.csv\n", " unzipped Data_exemple_3.csv to .\\Data_exemple_3.csv\n", " unzipped Data_exemple_4.csv to .\\Data_exemple_4.csv\n", " unzipped data_model.txt to .\\data_model.txt\n", " unzipped data_model_to_csv.py to .\\data_model_to_csv.py\n", " unzipped enonce.txt to .\\enonce.txt\n", " unzipped Exemple_arborescent.xml to .\\Exemple_arborescent.xml\n", " unzipped Flights.csv to .\\Flights.csv\n", " unzipped Incident.csv to .\\Incident.csv\n", " unzipped Motors.csv to .\\Motors.csv\n", " unzipped Motor_models.csv to .\\Motor_models.csv\n", " unzipped Planes.csv to .\\Planes.csv\n", " unzipped Plane_models.csv to .\\Plane_models.csv\n"]}, {"data": {"text/plain": ["['.\\\\Cities.csv',\n", " '.\\\\Compagnies.csv',\n", " '.\\\\Crews.csv',\n", " '.\\\\Crews_planes_habilitation.csv',\n", " '.\\\\Data_exemple.csv',\n", " '.\\\\Data_exemple_2.csv',\n", " '.\\\\Data_exemple_3.csv',\n", " '.\\\\Data_exemple_4.csv',\n", " '.\\\\data_model.txt',\n", " '.\\\\data_model_to_csv.py',\n", " '.\\\\enonce.txt',\n", " '.\\\\Exemple_arborescent.xml',\n", " '.\\\\Flights.csv',\n", " '.\\\\Incident.csv',\n", " '.\\\\Motors.csv',\n", " '.\\\\Motor_models.csv',\n", " '.\\\\Planes.csv',\n", " '.\\\\Plane_models.csv']"]}, "execution_count": 3, "metadata": {}, "output_type": "execute_result"}], "source": ["import pyensae.datasource\n", "pyensae.datasource.download_data(\"tp_2a_5_compagnies.zip\")"]}, {"cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": ["import os\n", "import pandas\n", "df_Incident = pandas.read_csv('Incident.csv', sep=';')\n", "df_Flights = pandas.read_csv('Flights.csv', sep=';')\n", "df_Crews = pandas.read_csv('Crews.csv', sep=';')\n", "df_Crews_planes_habilitation = pandas.read_csv('Crews_planes_habilitation.csv', sep=';')\n", "df_Planes = pandas.read_csv('Planes.csv', sep=';')\n", "df_Plane_models = pandas.read_csv('Plane_models.csv', sep=';')\n", "df_Motors = pandas.read_csv('Motors.csv', sep=';')\n", "df_Motor_models = pandas.read_csv('Motor_models.csv', sep=';')\n", "df_Compagnies = pandas.read_csv('Compagnies.csv', sep=';')\n", "df_Cities = pandas.read_csv('Cities.csv', sep=';')"]}, {"cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [{"data": {"text/html": ["
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Flight_id
2
11
21
30
33
\n", "
"], "text/plain": ["Empty DataFrame\n", "Columns: []\n", "Index: [2, 11, 21, 30, 33]"]}, "execution_count": 5, "metadata": {}, "output_type": "execute_result"}], "source": ["df_Incident.head(5)"]}, {"cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [{"data": {"text/html": ["
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Plane_idCrew_idDeparture_idArrival_idDate
Id
0408570datetime.date(2013, 2, 20)
1676164datetime.date(2013, 1, 9)
25767118datetime.date(2013, 8, 7)
341691614datetime.date(2013, 1, 10)
4602447datetime.date(2013, 9, 26)
\n", "
"], "text/plain": [" Plane_id Crew_id Departure_id Arrival_id Date\n", "Id \n", "0 40 85 7 0 datetime.date(2013, 2, 20)\n", "1 67 6 16 4 datetime.date(2013, 1, 9)\n", "2 57 67 1 18 datetime.date(2013, 8, 7)\n", "3 41 69 16 14 datetime.date(2013, 1, 10)\n", "4 60 24 4 7 datetime.date(2013, 9, 26)"]}, "execution_count": 6, "metadata": {}, "output_type": "execute_result"}], "source": ["df_Flights.head(5)"]}, {"cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [{"data": {"text/html": ["
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Captain_age
Id
054
153
256
343
445
\n", "
"], "text/plain": [" Captain_age\n", "Id \n", "0 54\n", "1 53\n", "2 56\n", "3 43\n", "4 45"]}, "execution_count": 7, "metadata": {}, "output_type": "execute_result"}], "source": ["df_Crews.head(5)"]}, {"cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [{"data": {"text/html": ["
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Plane_model_id
Crew_id
8540
667
6757
6941
2460
\n", "
"], "text/plain": [" Plane_model_id\n", "Crew_id \n", "85 40\n", "6 67\n", "67 57\n", "69 41\n", "24 60"]}, "execution_count": 8, "metadata": {}, "output_type": "execute_result"}], "source": ["df_Crews_planes_habilitation.head(5)"]}, {"cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [{"data": {"text/html": ["
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ImmatriculationPlane_model_idMotor_idCompagny_idNb_hours
Id
0'Q-KGNK'3001213
1'Q-HTQG'3107932
2'Q-XRMC'1202444
3'Q-GUOA'1301595
4'Q-WGIW'0402270
\n", "
"], "text/plain": [" Immatriculation Plane_model_id Motor_id Compagny_id Nb_hours\n", "Id \n", "0 'Q-KGNK' 3 0 0 1213\n", "1 'Q-HTQG' 3 1 0 7932\n", "2 'Q-XRMC' 1 2 0 2444\n", "3 'Q-GUOA' 1 3 0 1595\n", "4 'Q-WGIW' 0 4 0 2270"]}, "execution_count": 9, "metadata": {}, "output_type": "execute_result"}], "source": ["df_Planes.head(5)"]}, {"cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [{"data": {"text/html": ["
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ConstructorModel_name
Id
0AIRBUSA380
1AIRBUSA350
2BOEING787
3BOEING747
\n", "
"], "text/plain": [" Constructor Model_name\n", "Id \n", "0 AIRBUS A380\n", "1 AIRBUS A350\n", "2 BOEING 787\n", "3 BOEING 747"]}, "execution_count": 10, "metadata": {}, "output_type": "execute_result"}], "source": ["df_Plane_models.head(5)"]}, {"cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [{"data": {"text/html": ["
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Motor_model_idNb_hours
Id
042334
123609
227867
332673
423871
\n", "
"], "text/plain": [" Motor_model_id Nb_hours\n", "Id \n", "0 4 2334\n", "1 2 3609\n", "2 2 7867\n", "3 3 2673\n", "4 2 3871"]}, "execution_count": 11, "metadata": {}, "output_type": "execute_result"}], "source": ["df_Motors.head(5)"]}, {"cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [{"data": {"text/html": ["
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameManufacturer
Id
0SNECMA-123SNECMA
1SNECMA-246SNECMA
2GE-247GE
3GE-656GE
4ROLLS ROYCE-345ROLLS ROYCE
\n", "
"], "text/plain": [" Name Manufacturer\n", "Id \n", "0 SNECMA-123 SNECMA\n", "1 SNECMA-246 SNECMA\n", "2 GE-247 GE\n", "3 GE-656 GE\n", "4 ROLLS ROYCE-345 ROLLS ROYCE"]}, "execution_count": 12, "metadata": {}, "output_type": "execute_result"}], "source": ["df_Motor_models.head(5)"]}, {"cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [{"data": {"text/html": ["
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameCountry
Id
0Qatar AirwaysQatar
1Air FranceFrance
2IberiaEspagne
3LufthansaAllemagne
4Ryan AirIrlande
\n", "
"], "text/plain": [" Name Country\n", "Id \n", "0 Qatar Airways Qatar\n", "1 Air France France\n", "2 Iberia Espagne\n", "3 Lufthansa Allemagne\n", "4 Ryan Air Irlande"]}, "execution_count": 13, "metadata": {}, "output_type": "execute_result"}], "source": ["df_Compagnies.head(5)"]}, {"cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [{"data": {"text/html": ["
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameCountry
Id
0BombayInde
1New YorkUSA
2TokyoJapon
3ParisFrance
4FrancfortAllemagne
\n", "
"], "text/plain": [" Name Country\n", "Id \n", "0 Bombay Inde\n", "1 New York USA\n", "2 Tokyo Japon\n", "3 Paris France\n", "4 Francfort Allemagne"]}, "execution_count": 14, "metadata": {}, "output_type": "execute_result"}], "source": ["df_Cities.head(5)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Bien s\u00fbr, toutes les informations ne sont pas dans les tables telles quelles, il faudra faire principalement des jointures et des groupby pour obtenir les informations que l'on souhaite. Pour obtenir une table contenant les vols avec une colonne \"est_incident\", il faut faire :"]}, {"cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [{"data": {"text/html": ["
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IdPlane_idCrew_idDeparture_idArrival_idDateFlight_idIs_incident
00408570datetime.date(2013, 2, 20)NaNFalse
11676164datetime.date(2013, 1, 9)NaNFalse
225767118datetime.date(2013, 8, 7)2True
3341691614datetime.date(2013, 1, 10)NaNFalse
44602447datetime.date(2013, 9, 26)NaNFalse
\n", "
"], "text/plain": [" Id Plane_id Crew_id Departure_id Arrival_id \\\n", "0 0 40 85 7 0 \n", "1 1 67 6 16 4 \n", "2 2 57 67 1 18 \n", "3 3 41 69 16 14 \n", "4 4 60 24 4 7 \n", "\n", " Date Flight_id Is_incident \n", "0 datetime.date(2013, 2, 20) NaN False \n", "1 datetime.date(2013, 1, 9) NaN False \n", "2 datetime.date(2013, 8, 7) 2 True \n", "3 datetime.date(2013, 1, 10) NaN False \n", "4 datetime.date(2013, 9, 26) NaN False "]}, "execution_count": 15, "metadata": {}, "output_type": "execute_result"}], "source": ["import numpy as np\n", "try:\n", " df_Flights.reset_index( inplace = True )\n", " df_Incident.reset_index( inplace = True )\n", "except Exception:\n", " pass\n", " ## On suppose que cela vient du fait que les index ont d\u00e9j\u00e0 \u00e9t\u00e9 remis \u00e0 z\u00e9ros\n", "df_Flight_Incident = pandas.merge( df_Flights, df_Incident, left_on = \"Id\", right_on = \"Flight_id\", how=\"outer\" )\n", "df_Flight_Incident[\"Is_incident\"] = np.isnan( df_Flight_Incident[\"Flight_id\"] ) == False\n", "df_Flight_Incident.head(5)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["On peut aussi faire des statistiques par ville de d\u00e9part ..."]}, {"cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [{"data": {"text/plain": ["Departure_id\n", "0 0.146789\n", "1 0.192308\n", "2 0.220930\n", "3 0.222222\n", "4 0.206107\n", "5 0.218391\n", "6 0.180180\n", "7 0.261682\n", "8 0.120879\n", "9 0.131868\n", "10 0.141176\n", "11 0.184466\n", "12 0.097087\n", "13 0.204082\n", "14 0.174312\n", "15 0.186916\n", "16 0.206897\n", "17 0.272727\n", "18 0.189474\n", "19 0.211538\n", "Name: Is_incident, dtype: float64"]}, "execution_count": 16, "metadata": {}, "output_type": "execute_result"}], "source": ["df_Flight_Incident.groupby( \"Departure_id\" )[\"Is_incident\"].mean()"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Question 1 : Question pr\u00e9liminaires, taux d'incident\n", "\n", "A vous de jouer. Commencer par calculer le taux d'incident global."]}, {"cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": []}, {"cell_type": "markdown", "metadata": {}, "source": ["### Question 2 : Calcul de taux\n", " \n", "Calculer le taux d'incidents par : \n", " => Compagnies \n", " => Destination \n", " => Equipage \n", " => Constructeur d'appareil (Boeing, Airbus ou Bombardier) \n", " => Mod\u00e8le d'appareil (A320, A380, B737, etc ...) \n", " => Moteurs "]}, {"cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": []}, {"cell_type": "markdown", "metadata": {}, "source": ["### Question 3 : R\u00e9gression lin\u00e9aire\n", " \n", "Pouvez-vous raisonnablement faire une r\u00e9gression lin\u00e9aire sur le nombre d'heure de l'appareil ? Du moteur ? Pourquoi ?"]}, {"cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": []}, {"cell_type": "markdown", "metadata": {}, "source": ["### Question 4 : Iberia VS Lufthansa\n", " \n", "Pouvez-vous dire si, toute chose \u00e9gale par ailleurs, un avion d'Iberia a plus ou moins de chance d'avoir un incident qu'un avion de la Lufthansa ?"]}, {"cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": []}, {"cell_type": "markdown", "metadata": {}, "source": ["### Question 5 : Facteurs explicatifs\n", " \n", "Essayer de d\u00e9terminer quels sont les facteurs explicatifs. Lesquels vous paraissent statistiquement significatifs ? Y-a-t-il des corr\u00e9lations indirectes ? "]}, {"cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": []}], "metadata": {"kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.2"}}, "nbformat": 4, "nbformat_minor": 2}