module datasets.duration
#
Short summary#
module papierstat.datasets.duration
Jeux de données artificiel lié à la prédiction de durées.
Functions#
function |
truncated documentation |
---|---|
Construit un jeu de données artificiel qui simule des paquets préparés par un magasin. Chaque paquet est préparé dès … |
Documentation#
Jeux de données artificiel lié à la prédiction de durées.
- papierstat.datasets.duration.duration_selling(date_begin=None, date_end=None, mean_per_day=10, sigma_per_day=5, week_pattern=None, hour_begin=9, hour_end=19, gamma_k=6.0, gamma_theta=0.25)#
Construit un jeu de données artificiel qui simule des paquets préparés par un magasin. Chaque paquet est préparé dès la réception d’une commande à une heure précise, il est ensuite stocké jusqu’à ce qu’un client viennent le chercher.
- Paramètres:
date_begin – première date
date_end – dernière date
hour_begin – heure d’ouverture du magasin
hour_end – heure de fermeture du magasin
week_pattern – tableau de 7 valeurs ou None pour une distribution uniforme sur les jours de la semaine
mean_per_day – nombre de paquets moyen par jour (suit une loi gaussienne)
sigma_per_day – écart type pour la loi gaussienne
gamma_k – paramètre k d’une loi gamma
gamma_theta – paramètre
d’une loi gamma
- Renvoie:
jeu de données
<<<
from papierstat.datasets.duration import duration_selling print(duration_selling().head())
>>>
commande true_duration reception 0 2022-06-02 12:22:51.488341 1.345710 2022-06-02 13:43:36.042642 1 2022-06-02 11:36:21.083464 1.485541 2022-06-02 13:05:29.030540 2 2022-06-02 16:45:11.815302 3.091626 2022-06-03 09:50:41.667922 3 2022-06-02 14:58:24.291455 1.947712 2022-06-02 16:55:16.055390 4 2022-06-02 13:54:50.253030 0.202220 2022-06-02 14:06:58.246025
Les commandes sont réparties de façon uniformes sur la journée même si c’est peu probable. La durée suit une loi
. Cette durée est ajoutée à l’heure où est passée la commande, les heures nocturnes et le week-end ne sont pas comptées. La durée ne peut excéder 10h.
- papierstat.datasets.duration.gamma(shape, scale=1.0, size=None)#
Draw samples from a Gamma distribution.
Samples are drawn from a Gamma distribution with specified parameters, shape (sometimes designated « k ») and scale (sometimes designated « theta »), where both parameters are > 0.
Note
New code should use the
gamma
method of adefault_rng()
instance instead; please see the Quick Start.Parameters#
- shapefloat or array_like of floats
The shape of the gamma distribution. Must be non-negative.
- scalefloat or array_like of floats, optional
The scale of the gamma distribution. Must be non-negative. Default is equal to 1.
- sizeint or tuple of ints, optional
Output shape. If the given shape is, e.g.,
(m, n, k)
, thenm * n * k
samples are drawn. If size isNone
(default), a single value is returned ifshape
andscale
are both scalars. Otherwise,np.broadcast(shape, scale).size
samples are drawn.
Returns#
- outndarray or scalar
Drawn samples from the parameterized gamma distribution.
See Also#
- scipy.stats.gammaprobability density function, distribution or
cumulative density function, etc.
random.Generator.gamma: which should be used for new code.
Notes#
The probability density for the Gamma distribution is
where
is the shape and
the scale, and
is the Gamma function.
The Gamma distribution is often used to model the times to failure of electronic components, and arises naturally in processes for which the waiting times between Poisson distributed events are relevant.
References#
Examples#
Draw samples from the distribution:
>>> shape, scale = 2., 2. # mean=4, std=2*sqrt(2) >>> s = np.random.gamma(shape, scale, 1000)
Display the histogram of the samples, along with the probability density function:
>>> import matplotlib.pyplot as plt >>> import scipy.special as sps >>> count, bins, ignored = plt.hist(s, 50, density=True) >>> y = bins**(shape-1)*(np.exp(-bins/scale) / ... (sps.gamma(shape)*scale**shape)) >>> plt.plot(bins, y, linewidth=2, color='r') >>> plt.show()
- papierstat.datasets.duration.rand(d0, d1, ..., dn)#
Random values in a given shape.
Note
This is a convenience function for users porting code from Matlab, and wraps random_sample. That function takes a tuple to specify the size of the output, which is consistent with other NumPy functions like numpy.zeros and numpy.ones.
Create an array of the given shape and populate it with random samples from a uniform distribution over
[0, 1)
.Parameters#
- d0, d1, …, dnint, optional
The dimensions of the returned array, must be non-negative. If no argument is given a single Python float is returned.
Returns#
- outndarray, shape
(d0, d1, ..., dn)
Random values.
See Also#
random
Examples#
>>> np.random.rand(3,2) array([[ 0.14022471, 0.96360618], #random [ 0.37601032, 0.25528411], #random [ 0.49313049, 0.94909878]]) #random
- papierstat.datasets.duration.randn(d0, d1, ..., dn)#
Return a sample (or samples) from the « standard normal » distribution.
Note
This is a convenience function for users porting code from Matlab, and wraps standard_normal. That function takes a tuple to specify the size of the output, which is consistent with other NumPy functions like numpy.zeros and numpy.ones.
Note
New code should use the
standard_normal
method of adefault_rng()
instance instead; please see the Quick Start.If positive int_like arguments are provided, randn generates an array of shape
(d0, d1, ..., dn)
, filled with random floats sampled from a univariate « normal » (Gaussian) distribution of mean 0 and variance 1. A single float randomly sampled from the distribution is returned if no argument is provided.Parameters#
- d0, d1, …, dnint, optional
The dimensions of the returned array, must be non-negative. If no argument is given a single Python float is returned.
Returns#
- Zndarray or float
A
(d0, d1, ..., dn)
-shaped array of floating-point samples from the standard normal distribution, or a single such float if no parameters were supplied.
See Also#
standard_normal : Similar, but takes a tuple as its argument. normal : Also accepts mu and sigma arguments. random.Generator.standard_normal: which should be used for new code.
Notes#
For random samples from
, use:
sigma * np.random.randn(...) + mu
Examples#
>>> np.random.randn() 2.1923875335537315 # random
Two-by-four array of samples from N(3, 6.25):
>>> 3 + 2.5 * np.random.randn(2, 4) array([[-4.49401501, 4.00950034, -1.81814867, 7.29718677], # random [ 0.39924804, 4.68456316, 4.99394529, 4.84057254]]) # random