{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# 1A.data - D\u00e9corr\u00e9lation de variables al\u00e9atoires - correction\n", "\n", "On construit des variables corr\u00e9l\u00e9es gaussiennes et on cherche \u00e0 construire des variables d\u00e9corr\u00e9l\u00e9es en utilisant le calcul matriciel. (correction)"]}, {"cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [{"data": {"text/html": ["
run previous cell, wait for 2 seconds
\n", ""], "text/plain": [""]}, "execution_count": 2, "metadata": {}, "output_type": "execute_result"}], "source": ["from jyquickhelper import add_notebook_menu\n", "add_notebook_menu()"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Cr\u00e9ation d'un jeu de donn\u00e9es"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Q1\n", "\n", "La premi\u00e8re \u00e9tape consiste \u00e0 construire des variables al\u00e9atoires normales corr\u00e9l\u00e9es dans une matrice $N \\times 3$. On cherche \u00e0 construire cette matrice au format [numpy](http://www.numpy.org/). Le programme suivant est un moyen de construire un tel ensemble \u00e0 l'aide de combinaisons lin\u00e9aires. Compl\u00e9tez les lignes contenant des ``....``."]}, {"cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [{"data": {"text/plain": ["matrix([[-0.63295784, -2.80012415, 0.22050058],\n", " [-1.21821148, -3.04992927, 3.24455663],\n", " [-0.32571451, -0.93074193, 0.50069519],\n", " [-0.13063124, -1.07137214, -0.56615199],\n", " [-0.36056318, -1.50832676, 0.32408593]])"]}, "execution_count": 3, "metadata": {}, "output_type": "execute_result"}], "source": ["import random\n", "import numpy as np\n", "\n", "def combinaison():\n", " x = random.gauss(0,1) # g\u00e9n\u00e8re un nombre al\u00e9atoire\n", " y = random.gauss(0,1) # selon une loi normale\n", " z = random.gauss(0,1) # de moyenne null et de variance 1\n", " x2 = x\n", " y2 = 3*x + y\n", " z2 = -2*x + y + 0.2*z\n", " return [x2, y2, z2]\n", " \n", "li = [ combinaison () for i in range (0,100) ]\n", "mat = np.matrix(li)\n", "mat[:5]"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Q2 "]}, {"cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [{"data": {"text/plain": ["matrix([[ 0.82547076, 2.51922244, -1.58633195],\n", " [ 2.51922244, 9.04578378, -3.50440536],\n", " [-1.58633195, -3.50440536, 4.40003306]])"]}, "execution_count": 4, "metadata": {}, "output_type": "execute_result"}], "source": ["npm = mat\n", "t = npm.transpose ()\n", "a = t @ npm \n", "a /= npm.shape[0]\n", "a"]}, {"cell_type": "markdown", "metadata": {}, "source": ["``a`` est la matrice de covariance."]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Corr\u00e9lation de matrices"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Q3"]}, {"cell_type": "code", "execution_count": 4, "metadata": {"collapsed": true}, "outputs": [], "source": ["cov = a"]}, {"cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [{"data": {"text/plain": ["array([[ 1.21142995, 0.36595362, 0.52471223],\n", " [ 0.36595362, 0.11054874, 0.15850718],\n", " [ 0.52471223, 0.15850718, 0.22727102]])"]}, "execution_count": 6, "metadata": {}, "output_type": "execute_result"}], "source": ["var = np.array([cov[i,i]**(-0.5) for i in range(cov.shape[0])])\n", "var.resize((3,1))\n", "varvar = var @ var.transpose()\n", "varvar"]}, {"cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [{"data": {"text/plain": ["matrix([[ 1. , 0.92191858, -0.83236777],\n", " [ 0.92191858, 1. , -0.5554734 ],\n", " [-0.83236777, -0.5554734 , 1. ]])"]}, "execution_count": 7, "metadata": {}, "output_type": "execute_result"}], "source": ["cor = np.multiply(cov, varvar)\n", "cor"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Q4 "]}, {"cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [{"data": {"text/plain": ["matrix([[ 1. , 0.92191858, -0.83236777],\n", " [ 0.92191858, 1. , -0.5554734 ],\n", " [-0.83236777, -0.5554734 , 1. ]])"]}, "execution_count": 8, "metadata": {}, "output_type": "execute_result"}], "source": ["def correlation(npm):\n", " t = npm.transpose ()\n", " a = t @ npm \n", " a /= npm.shape[0]\n", " var = np.array([cov[i,i]**(-0.5) for i in range(cov.shape[0])])\n", " var.resize((3,1))\n", " varvar = var @ var.transpose()\n", " return np.multiply(cov, varvar)\n", "\n", "correlation(npm)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Calcul de la racine carr\u00e9e"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Q6\n", "\n", "Le module [numpy](http://www.numpy.org/) propose une fonction qui retourne la matrice $P$ et le vecteur des valeurs propres $L$ :\n", "\n", "```\n", "L,P = np.linalg.eig(a)\n", "```\n", "\n", "V\u00e9rifier que $P'P=I$. Est-ce rigoureusement \u00e9gal \u00e0 la matrice identit\u00e9 ?"]}, {"cell_type": "code", "execution_count": 8, "metadata": {"collapsed": true}, "outputs": [], "source": ["L,P = np.linalg.eig(a)"]}, {"cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [{"data": {"text/plain": ["matrix([[ 1.00000000e+00, -6.13577229e-17, -2.23247265e-16],\n", " [ -6.13577229e-17, 1.00000000e+00, -1.20740141e-16],\n", " [ -2.23247265e-16, -1.20740141e-16, 1.00000000e+00]])"]}, "execution_count": 10, "metadata": {}, "output_type": "execute_result"}], "source": ["P.transpose() @ P"]}, {"cell_type": "markdown", "metadata": {}, "source": ["C'est presque l'identit\u00e9 aux erreurs de calcul pr\u00e8s."]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Q7\n", "\n", "``np.diag(l)`` construit une matrice diagonale \u00e0 partir d'un vecteur."]}, {"cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [{"data": {"text/plain": ["array([[ 1.17360418e+01, 0.00000000e+00, 0.00000000e+00],\n", " [ 0.00000000e+00, 1.31745703e-03, 0.00000000e+00],\n", " [ 0.00000000e+00, 0.00000000e+00, 2.53392830e+00]])"]}, "execution_count": 11, "metadata": {}, "output_type": "execute_result"}], "source": ["np.diag(L)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Q8\n", "\n", "Ecrire une fonction qui calcule la racine carr\u00e9e de la matrice $\\frac{1}{n}M'M$ (on rappelle que $M$ est la matrice ``npm``). Voir aussi [Racine carr\u00e9e d'une matrice](https://fr.wikipedia.org/wiki/Racine_carr%C3%A9e_d%27une_matrice)."]}, {"cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [{"data": {"text/plain": ["matrix([[ 0.27891067, 0.69732306, -0.51129263],\n", " [ 0.69732306, 2.85042611, -0.65923845],\n", " [-0.51129263, -0.65923845, 1.92458244]])"]}, "execution_count": 12, "metadata": {}, "output_type": "execute_result"}], "source": ["def square_root_matrix(M):\n", " L,P = np.linalg.eig(M)\n", " L = L ** 0.5\n", " root = P @ np.diag(L) @ P.transpose()\n", " return root\n", " \n", "root = square_root_matrix(cov)\n", "root"]}, {"cell_type": "markdown", "metadata": {}, "source": ["On v\u00e9rifie qu'on ne s'est pas tromp\u00e9."]}, {"cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [{"data": {"text/plain": ["matrix([[ 0.00000000e+00, -1.33226763e-15, -8.88178420e-16],\n", " [ -8.88178420e-16, -8.88178420e-15, 4.44089210e-16],\n", " [ -6.66133815e-16, 1.77635684e-15, 1.77635684e-15]])"]}, "execution_count": 13, "metadata": {}, "output_type": "execute_result"}], "source": ["root @ root - cov"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## D\u00e9corr\u00e9lation"]}, {"cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [{"data": {"text/plain": ["matrix([[ 702.44132815, -141.03278518, 140.9237308 ],\n", " [-141.03278518, 28.4757628 , -28.16665145],\n", " [ 140.9237308 , -28.16665145, 28.60079705]])"]}, "execution_count": 14, "metadata": {}, "output_type": "execute_result"}], "source": ["np.linalg.inv(cov)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Q9 \n", "\n", "Chaque ligne de la matrice $M$ repr\u00e9sente un vecteur de trois variables corr\u00e9l\u00e9es. La matrice de covariance est $V=\\frac{1}{n}M'M$. Calculer la matrice de covariance de la matrice $N=M V^{-\\frac{1}{2}}$ (math\u00e9matiquement).\n"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Q10\n", "\n", "V\u00e9rifier num\u00e9riquement."]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Simulation de variables corr\u00e9l\u00e9es"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Q11\n", "\n", "A partir du r\u00e9sultat pr\u00e9c\u00e9dent, proposer une m\u00e9thode pour simuler un vecteur de variables corr\u00e9l\u00e9es selon une matrice de covariance $V$ \u00e0 partir d'un vecteur de lois normales ind\u00e9pendantes."]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Q12\n", "\n", "Proposer une fonction qui cr\u00e9e cet \u00e9chantillon :"]}, {"cell_type": "code", "execution_count": 14, "metadata": {"collapsed": true}, "outputs": [], "source": ["def simultation (N, cov) :\n", " # simule un \u00e9chantillon de variables corr\u00e9l\u00e9es\n", " # N : nombre de variables\n", " # cov : matrice de covariance\n", " # ...\n", " return M"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Q13\n", "\n", "V\u00e9rifier que votre \u00e9chantillon a une matrice de corr\u00e9lations proche de celle choisie pour simuler l'\u00e9chantillon."]}, {"cell_type": "code", "execution_count": 15, "metadata": {"collapsed": true}, "outputs": [], "source": []}], "metadata": {"kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.1"}}, "nbformat": 4, "nbformat_minor": 2}