{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# 2A.i - Mod\u00e8le relationnel, analyse d'incidents dans le transport a\u00e9rien\n", "\n", "Base de donn\u00e9es relationnelles, logique SQL."]}, {"cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [{"data": {"text/html": ["
\n", ""], "text/plain": [""]}, "execution_count": 2, "metadata": {}, "output_type": "execute_result"}], "source": ["from jyquickhelper import add_notebook_menu\n", "add_notebook_menu()"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Donn\u00e9es\n", "\n", "Le code suivant t\u00e9l\u00e9charge les donn\u00e9es n\u00e9cessaires [tp_2a_5_compagnies.zip](http://www.xavierdupre.fr/enseignement/complements/tp_2a_5_compagnies.zip)."]}, {"cell_type": "code", "execution_count": 2, "metadata": {"scrolled": false}, "outputs": [{"name": "stdout", "output_type": "stream", "text": [" downloading of http://www.xavierdupre.fr/enseignement/complements/tp_2a_5_compagnies.zip to tp_2a_5_compagnies.zip\n", " unzipped Cities.csv to .\\Cities.csv\n", " unzipped Compagnies.csv to .\\Compagnies.csv\n", " unzipped Crews.csv to .\\Crews.csv\n", " unzipped Crews_planes_habilitation.csv to .\\Crews_planes_habilitation.csv\n", " unzipped Data_exemple.csv to .\\Data_exemple.csv\n", " unzipped Data_exemple_2.csv to .\\Data_exemple_2.csv\n", " unzipped Data_exemple_3.csv to .\\Data_exemple_3.csv\n", " unzipped Data_exemple_4.csv to .\\Data_exemple_4.csv\n", " unzipped data_model.txt to .\\data_model.txt\n", " unzipped data_model_to_csv.py to .\\data_model_to_csv.py\n", " unzipped enonce.txt to .\\enonce.txt\n", " unzipped Exemple_arborescent.xml to .\\Exemple_arborescent.xml\n", " unzipped Flights.csv to .\\Flights.csv\n", " unzipped Incident.csv to .\\Incident.csv\n", " unzipped Motors.csv to .\\Motors.csv\n", " unzipped Motor_models.csv to .\\Motor_models.csv\n", " unzipped Planes.csv to .\\Planes.csv\n", " unzipped Plane_models.csv to .\\Plane_models.csv\n"]}, {"data": {"text/plain": ["['.\\\\Cities.csv',\n", " '.\\\\Compagnies.csv',\n", " '.\\\\Crews.csv',\n", " '.\\\\Crews_planes_habilitation.csv',\n", " '.\\\\Data_exemple.csv',\n", " '.\\\\Data_exemple_2.csv',\n", " '.\\\\Data_exemple_3.csv',\n", " '.\\\\Data_exemple_4.csv',\n", " '.\\\\data_model.txt',\n", " '.\\\\data_model_to_csv.py',\n", " '.\\\\enonce.txt',\n", " '.\\\\Exemple_arborescent.xml',\n", " '.\\\\Flights.csv',\n", " '.\\\\Incident.csv',\n", " '.\\\\Motors.csv',\n", " '.\\\\Motor_models.csv',\n", " '.\\\\Planes.csv',\n", " '.\\\\Plane_models.csv']"]}, "execution_count": 3, "metadata": {}, "output_type": "execute_result"}], "source": ["import pyensae.datasource\n", "pyensae.datasource.download_data(\"tp_2a_5_compagnies.zip\")"]}, {"cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": ["import os\n", "import pandas\n", "df_Incident = pandas.read_csv('Incident.csv', sep=';')\n", "df_Flights = pandas.read_csv('Flights.csv', sep=';')\n", "df_Crews = pandas.read_csv('Crews.csv', sep=';')\n", "df_Crews_planes_habilitation = pandas.read_csv('Crews_planes_habilitation.csv', sep=';')\n", "df_Planes = pandas.read_csv('Planes.csv', sep=';')\n", "df_Plane_models = pandas.read_csv('Plane_models.csv', sep=';')\n", "df_Motors = pandas.read_csv('Motors.csv', sep=';')\n", "df_Motor_models = pandas.read_csv('Motor_models.csv', sep=';')\n", "df_Compagnies = pandas.read_csv('Compagnies.csv', sep=';')\n", "df_Cities = pandas.read_csv('Cities.csv', sep=';')"]}, {"cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [{"data": {"text/html": ["\n", "
\n", " \n", " \n", " | \n", "
\n", " \n", " Flight_id | \n", "
\n", " \n", " \n", " \n", " 2 | \n", "
\n", " \n", " 11 | \n", "
\n", " \n", " 21 | \n", "
\n", " \n", " 30 | \n", "
\n", " \n", " 33 | \n", "
\n", " \n", "
\n", "
"], "text/plain": ["Empty DataFrame\n", "Columns: []\n", "Index: [2, 11, 21, 30, 33]"]}, "execution_count": 5, "metadata": {}, "output_type": "execute_result"}], "source": ["df_Incident.head(5)"]}, {"cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [{"data": {"text/html": ["\n", "
\n", " \n", " \n", " | \n", " Plane_id | \n", " Crew_id | \n", " Departure_id | \n", " Arrival_id | \n", " Date | \n", "
\n", " \n", " Id | \n", " | \n", " | \n", " | \n", " | \n", " | \n", "
\n", " \n", " \n", " \n", " 0 | \n", " 40 | \n", " 85 | \n", " 7 | \n", " 0 | \n", " datetime.date(2013, 2, 20) | \n", "
\n", " \n", " 1 | \n", " 67 | \n", " 6 | \n", " 16 | \n", " 4 | \n", " datetime.date(2013, 1, 9) | \n", "
\n", " \n", " 2 | \n", " 57 | \n", " 67 | \n", " 1 | \n", " 18 | \n", " datetime.date(2013, 8, 7) | \n", "
\n", " \n", " 3 | \n", " 41 | \n", " 69 | \n", " 16 | \n", " 14 | \n", " datetime.date(2013, 1, 10) | \n", "
\n", " \n", " 4 | \n", " 60 | \n", " 24 | \n", " 4 | \n", " 7 | \n", " datetime.date(2013, 9, 26) | \n", "
\n", " \n", "
\n", "
"], "text/plain": [" Plane_id Crew_id Departure_id Arrival_id Date\n", "Id \n", "0 40 85 7 0 datetime.date(2013, 2, 20)\n", "1 67 6 16 4 datetime.date(2013, 1, 9)\n", "2 57 67 1 18 datetime.date(2013, 8, 7)\n", "3 41 69 16 14 datetime.date(2013, 1, 10)\n", "4 60 24 4 7 datetime.date(2013, 9, 26)"]}, "execution_count": 6, "metadata": {}, "output_type": "execute_result"}], "source": ["df_Flights.head(5)"]}, {"cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [{"data": {"text/html": ["\n", "
\n", " \n", " \n", " | \n", " Captain_age | \n", "
\n", " \n", " Id | \n", " | \n", "
\n", " \n", " \n", " \n", " 0 | \n", " 54 | \n", "
\n", " \n", " 1 | \n", " 53 | \n", "
\n", " \n", " 2 | \n", " 56 | \n", "
\n", " \n", " 3 | \n", " 43 | \n", "
\n", " \n", " 4 | \n", " 45 | \n", "
\n", " \n", "
\n", "
"], "text/plain": [" Captain_age\n", "Id \n", "0 54\n", "1 53\n", "2 56\n", "3 43\n", "4 45"]}, "execution_count": 7, "metadata": {}, "output_type": "execute_result"}], "source": ["df_Crews.head(5)"]}, {"cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [{"data": {"text/html": ["\n", "
\n", " \n", " \n", " | \n", " Plane_model_id | \n", "
\n", " \n", " Crew_id | \n", " | \n", "
\n", " \n", " \n", " \n", " 85 | \n", " 40 | \n", "
\n", " \n", " 6 | \n", " 67 | \n", "
\n", " \n", " 67 | \n", " 57 | \n", "
\n", " \n", " 69 | \n", " 41 | \n", "
\n", " \n", " 24 | \n", " 60 | \n", "
\n", " \n", "
\n", "
"], "text/plain": [" Plane_model_id\n", "Crew_id \n", "85 40\n", "6 67\n", "67 57\n", "69 41\n", "24 60"]}, "execution_count": 8, "metadata": {}, "output_type": "execute_result"}], "source": ["df_Crews_planes_habilitation.head(5)"]}, {"cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [{"data": {"text/html": ["\n", "
\n", " \n", " \n", " | \n", " Immatriculation | \n", " Plane_model_id | \n", " Motor_id | \n", " Compagny_id | \n", " Nb_hours | \n", "
\n", " \n", " Id | \n", " | \n", " | \n", " | \n", " | \n", " | \n", "
\n", " \n", " \n", " \n", " 0 | \n", " 'Q-KGNK' | \n", " 3 | \n", " 0 | \n", " 0 | \n", " 1213 | \n", "
\n", " \n", " 1 | \n", " 'Q-HTQG' | \n", " 3 | \n", " 1 | \n", " 0 | \n", " 7932 | \n", "
\n", " \n", " 2 | \n", " 'Q-XRMC' | \n", " 1 | \n", " 2 | \n", " 0 | \n", " 2444 | \n", "
\n", " \n", " 3 | \n", " 'Q-GUOA' | \n", " 1 | \n", " 3 | \n", " 0 | \n", " 1595 | \n", "
\n", " \n", " 4 | \n", " 'Q-WGIW' | \n", " 0 | \n", " 4 | \n", " 0 | \n", " 2270 | \n", "
\n", " \n", "
\n", "
"], "text/plain": [" Immatriculation Plane_model_id Motor_id Compagny_id Nb_hours\n", "Id \n", "0 'Q-KGNK' 3 0 0 1213\n", "1 'Q-HTQG' 3 1 0 7932\n", "2 'Q-XRMC' 1 2 0 2444\n", "3 'Q-GUOA' 1 3 0 1595\n", "4 'Q-WGIW' 0 4 0 2270"]}, "execution_count": 9, "metadata": {}, "output_type": "execute_result"}], "source": ["df_Planes.head(5)"]}, {"cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [{"data": {"text/html": ["\n", "
\n", " \n", " \n", " | \n", " Constructor | \n", " Model_name | \n", "
\n", " \n", " Id | \n", " | \n", " | \n", "
\n", " \n", " \n", " \n", " 0 | \n", " AIRBUS | \n", " A380 | \n", "
\n", " \n", " 1 | \n", " AIRBUS | \n", " A350 | \n", "
\n", " \n", " 2 | \n", " BOEING | \n", " 787 | \n", "
\n", " \n", " 3 | \n", " BOEING | \n", " 747 | \n", "
\n", " \n", "
\n", "
"], "text/plain": [" Constructor Model_name\n", "Id \n", "0 AIRBUS A380\n", "1 AIRBUS A350\n", "2 BOEING 787\n", "3 BOEING 747"]}, "execution_count": 10, "metadata": {}, "output_type": "execute_result"}], "source": ["df_Plane_models.head(5)"]}, {"cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [{"data": {"text/html": ["\n", "
\n", " \n", " \n", " | \n", " Motor_model_id | \n", " Nb_hours | \n", "
\n", " \n", " Id | \n", " | \n", " | \n", "
\n", " \n", " \n", " \n", " 0 | \n", " 4 | \n", " 2334 | \n", "
\n", " \n", " 1 | \n", " 2 | \n", " 3609 | \n", "
\n", " \n", " 2 | \n", " 2 | \n", " 7867 | \n", "
\n", " \n", " 3 | \n", " 3 | \n", " 2673 | \n", "
\n", " \n", " 4 | \n", " 2 | \n", " 3871 | \n", "
\n", " \n", "
\n", "
"], "text/plain": [" Motor_model_id Nb_hours\n", "Id \n", "0 4 2334\n", "1 2 3609\n", "2 2 7867\n", "3 3 2673\n", "4 2 3871"]}, "execution_count": 11, "metadata": {}, "output_type": "execute_result"}], "source": ["df_Motors.head(5)"]}, {"cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [{"data": {"text/html": ["\n", "
\n", " \n", " \n", " | \n", " Name | \n", " Manufacturer | \n", "
\n", " \n", " Id | \n", " | \n", " | \n", "
\n", " \n", " \n", " \n", " 0 | \n", " SNECMA-123 | \n", " SNECMA | \n", "
\n", " \n", " 1 | \n", " SNECMA-246 | \n", " SNECMA | \n", "
\n", " \n", " 2 | \n", " GE-247 | \n", " GE | \n", "
\n", " \n", " 3 | \n", " GE-656 | \n", " GE | \n", "
\n", " \n", " 4 | \n", " ROLLS ROYCE-345 | \n", " ROLLS ROYCE | \n", "
\n", " \n", "
\n", "
"], "text/plain": [" Name Manufacturer\n", "Id \n", "0 SNECMA-123 SNECMA\n", "1 SNECMA-246 SNECMA\n", "2 GE-247 GE\n", "3 GE-656 GE\n", "4 ROLLS ROYCE-345 ROLLS ROYCE"]}, "execution_count": 12, "metadata": {}, "output_type": "execute_result"}], "source": ["df_Motor_models.head(5)"]}, {"cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [{"data": {"text/html": ["\n", "
\n", " \n", " \n", " | \n", " Name | \n", " Country | \n", "
\n", " \n", " Id | \n", " | \n", " | \n", "
\n", " \n", " \n", " \n", " 0 | \n", " Qatar Airways | \n", " Qatar | \n", "
\n", " \n", " 1 | \n", " Air France | \n", " France | \n", "
\n", " \n", " 2 | \n", " Iberia | \n", " Espagne | \n", "
\n", " \n", " 3 | \n", " Lufthansa | \n", " Allemagne | \n", "
\n", " \n", " 4 | \n", " Ryan Air | \n", " Irlande | \n", "
\n", " \n", "
\n", "
"], "text/plain": [" Name Country\n", "Id \n", "0 Qatar Airways Qatar\n", "1 Air France France\n", "2 Iberia Espagne\n", "3 Lufthansa Allemagne\n", "4 Ryan Air Irlande"]}, "execution_count": 13, "metadata": {}, "output_type": "execute_result"}], "source": ["df_Compagnies.head(5)"]}, {"cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [{"data": {"text/html": ["\n", "
\n", " \n", " \n", " | \n", " Name | \n", " Country | \n", "
\n", " \n", " Id | \n", " | \n", " | \n", "
\n", " \n", " \n", " \n", " 0 | \n", " Bombay | \n", " Inde | \n", "
\n", " \n", " 1 | \n", " New York | \n", " USA | \n", "
\n", " \n", " 2 | \n", " Tokyo | \n", " Japon | \n", "
\n", " \n", " 3 | \n", " Paris | \n", " France | \n", "
\n", " \n", " 4 | \n", " Francfort | \n", " Allemagne | \n", "
\n", " \n", "
\n", "
"], "text/plain": [" Name Country\n", "Id \n", "0 Bombay Inde\n", "1 New York USA\n", "2 Tokyo Japon\n", "3 Paris France\n", "4 Francfort Allemagne"]}, "execution_count": 14, "metadata": {}, "output_type": "execute_result"}], "source": ["df_Cities.head(5)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Bien s\u00fbr, toutes les informations ne sont pas dans les tables telles quelles, il faudra faire principalement des jointures et des groupby pour obtenir les informations que l'on souhaite. Pour obtenir une table contenant les vols avec une colonne \"est_incident\", il faut faire :"]}, {"cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [{"data": {"text/html": ["\n", "
\n", " \n", " \n", " | \n", " Id | \n", " Plane_id | \n", " Crew_id | \n", " Departure_id | \n", " Arrival_id | \n", " Date | \n", " Flight_id | \n", " Is_incident | \n", "
\n", " \n", " \n", " \n", " 0 | \n", " 0 | \n", " 40 | \n", " 85 | \n", " 7 | \n", " 0 | \n", " datetime.date(2013, 2, 20) | \n", " NaN | \n", " False | \n", "
\n", " \n", " 1 | \n", " 1 | \n", " 67 | \n", " 6 | \n", " 16 | \n", " 4 | \n", " datetime.date(2013, 1, 9) | \n", " NaN | \n", " False | \n", "
\n", " \n", " 2 | \n", " 2 | \n", " 57 | \n", " 67 | \n", " 1 | \n", " 18 | \n", " datetime.date(2013, 8, 7) | \n", " 2 | \n", " True | \n", "
\n", " \n", " 3 | \n", " 3 | \n", " 41 | \n", " 69 | \n", " 16 | \n", " 14 | \n", " datetime.date(2013, 1, 10) | \n", " NaN | \n", " False | \n", "
\n", " \n", " 4 | \n", " 4 | \n", " 60 | \n", " 24 | \n", " 4 | \n", " 7 | \n", " datetime.date(2013, 9, 26) | \n", " NaN | \n", " False | \n", "
\n", " \n", "
\n", "
"], "text/plain": [" Id Plane_id Crew_id Departure_id Arrival_id \\\n", "0 0 40 85 7 0 \n", "1 1 67 6 16 4 \n", "2 2 57 67 1 18 \n", "3 3 41 69 16 14 \n", "4 4 60 24 4 7 \n", "\n", " Date Flight_id Is_incident \n", "0 datetime.date(2013, 2, 20) NaN False \n", "1 datetime.date(2013, 1, 9) NaN False \n", "2 datetime.date(2013, 8, 7) 2 True \n", "3 datetime.date(2013, 1, 10) NaN False \n", "4 datetime.date(2013, 9, 26) NaN False "]}, "execution_count": 15, "metadata": {}, "output_type": "execute_result"}], "source": ["import numpy as np\n", "try:\n", " df_Flights.reset_index( inplace = True )\n", " df_Incident.reset_index( inplace = True )\n", "except Exception:\n", " pass\n", " ## On suppose que cela vient du fait que les index ont d\u00e9j\u00e0 \u00e9t\u00e9 remis \u00e0 z\u00e9ros\n", "df_Flight_Incident = pandas.merge( df_Flights, df_Incident, left_on = \"Id\", right_on = \"Flight_id\", how=\"outer\" )\n", "df_Flight_Incident[\"Is_incident\"] = np.isnan( df_Flight_Incident[\"Flight_id\"] ) == False\n", "df_Flight_Incident.head(5)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["On peut aussi faire des statistiques par ville de d\u00e9part ..."]}, {"cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [{"data": {"text/plain": ["Departure_id\n", "0 0.146789\n", "1 0.192308\n", "2 0.220930\n", "3 0.222222\n", "4 0.206107\n", "5 0.218391\n", "6 0.180180\n", "7 0.261682\n", "8 0.120879\n", "9 0.131868\n", "10 0.141176\n", "11 0.184466\n", "12 0.097087\n", "13 0.204082\n", "14 0.174312\n", "15 0.186916\n", "16 0.206897\n", "17 0.272727\n", "18 0.189474\n", "19 0.211538\n", "Name: Is_incident, dtype: float64"]}, "execution_count": 16, "metadata": {}, "output_type": "execute_result"}], "source": ["df_Flight_Incident.groupby( \"Departure_id\" )[\"Is_incident\"].mean()"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Question 1 : Question pr\u00e9liminaires, taux d'incident\n", "\n", "A vous de jouer. Commencer par calculer le taux d'incident global."]}, {"cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": []}, {"cell_type": "markdown", "metadata": {}, "source": ["### Question 2 : Calcul de taux\n", " \n", "Calculer le taux d'incidents par : \n", " => Compagnies \n", " => Destination \n", " => Equipage \n", " => Constructeur d'appareil (Boeing, Airbus ou Bombardier) \n", " => Mod\u00e8le d'appareil (A320, A380, B737, etc ...) \n", " => Moteurs "]}, {"cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": []}, {"cell_type": "markdown", "metadata": {}, "source": ["### Question 3 : R\u00e9gression lin\u00e9aire\n", " \n", "Pouvez-vous raisonnablement faire une r\u00e9gression lin\u00e9aire sur le nombre d'heure de l'appareil ? Du moteur ? Pourquoi ?"]}, {"cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": []}, {"cell_type": "markdown", "metadata": {}, "source": ["### Question 4 : Iberia VS Lufthansa\n", " \n", "Pouvez-vous dire si, toute chose \u00e9gale par ailleurs, un avion d'Iberia a plus ou moins de chance d'avoir un incident qu'un avion de la Lufthansa ?"]}, {"cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": []}, {"cell_type": "markdown", "metadata": {}, "source": ["### Question 5 : Facteurs explicatifs\n", " \n", "Essayer de d\u00e9terminer quels sont les facteurs explicatifs. Lesquels vous paraissent statistiquement significatifs ? Y-a-t-il des corr\u00e9lations indirectes ? "]}, {"cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": []}], "metadata": {"kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.2"}}, "nbformat": 4, "nbformat_minor": 2}