module optim.sgd#

Inheritance diagram of mlstatpy.optim.sgd

Short summary#

module mlstatpy.optim.sgd

Implements simple stochastic gradient optimisation. It is inspired from

source on GitHub



truncated documentation


Base stochastic gradient descent optimizer.


Stochastic gradient descent optimizer with momentum.



truncated documentation




Displays training progress.


Displays training progress.





Gets the values used to update params with given gradients.


Applies regularization.


Applies regularization.


Performs update to learning rate and potentially other states at the end of an iteration.


Performs updates to learning rate and potential other states at the end of an iteration.




Optimizes the coefficients.


Optimizes the coefficients.


Updates coefficients with given gradient.


Updates coefficients with given gradient.


Implements simple stochastic gradient optimisation. It is inspired from

source on GitHub

class mlstatpy.optim.sgd.BaseOptimizer(coef, learning_rate_init=0.1, min_threshold=None, max_threshold=None, l1=0.0, l2=0.0)#

Bases : object

Base stochastic gradient descent optimizer.

  • coef – array, initial coefficient

  • learning_rate_init – float The initial learning rate used. It controls the step-size in updating the weights.

  • min_threshold – coefficients must be higher than min_thresold

  • max_threshold – coefficients must be below than max_thresold

The class holds the following attributes:

  • learning_rate: float, the current learning rate

  • coef: optimized coefficients

  • min_threshold, max_threshold: coefficients thresholds

  • l2: L2 regularization

  • l1: L1 regularization

source on GitHub

__init__(coef, learning_rate_init=0.1, min_threshold=None, max_threshold=None, l1=0.0, l2=0.0)#
_display_progress(it, max_iter, loss, losses=None, msg=None)#

Displays training progress.

_evaluate_early_stopping(it, max_iter, losses, early_th, verbose=False)#

Applies regularization.

source on GitHub


Performs update to learning rate and potentially other states at the end of an iteration.

source on GitHub

train(X, y, fct_loss, fct_grad, max_iter=100, early_th=None, verbose=False)#

Optimizes the coefficients.

  • X – datasets (array)

  • y – expected target

  • fct_loss – loss function, signature: f(coef, X, y) -> float

  • fct_grad – gradient function, signature: g(coef, x, y, i) -> array

  • max_iter – number maximum of iteration

  • early_th – stops the training if the error goes below this threshold

  • verbose – display information



The method keeps the best coefficients for the minimal loss.

source on GitHub


Updates coefficients with given gradient.


grad – array, gradient

source on GitHub

class mlstatpy.optim.sgd.SGDOptimizer(coef, learning_rate_init=0.1, lr_schedule='invscaling', momentum=0.9, power_t=0.5, early_th=None, min_threshold=None, max_threshold=None, l1=0.0, l2=0.0)#

Bases : BaseOptimizer

Stochastic gradient descent optimizer with momentum.

  • coef – array, initial coefficient

  • learning_rate_init – float The initial learning rate used. It controls the step-size in updating the weights,

  • lr_schedule{“constant”, “adaptive”, “invscaling”}, learning rate schedule for weight updates, “constant” for a constant learning rate given by learning_rate_init. “invscaling” gradually decreases the learning rate learning_rate_ at each time step t using an inverse scaling exponent of power_t. learning_rate_ = learning_rate_init / pow(t, power_t), “adaptive”, keeps the learning rate constant to learning_rate_init as long as the training keeps decreasing. Each time 2 consecutive epochs fail to decrease the training loss by tol, or fail to increase validation score by tol if “early_stopping” is on, the current learning rate is divided by 5.

  • momentum – float Value of momentum used, must be larger than or equal to 0

  • power_t – double The exponent for inverse scaling learning rate.

  • early_th – stops if the error goes below that threshold

  • min_threshold – lower bound for parameters (can be None)

  • max_threshold – upper bound for parameters (can be None)

  • l1 – L1 regularization

  • l2 – L2 regularization

The class holds the following attributes:

  • learning_rate: float, the current learning rate

  • velocity*: array, velocity that are used to update params

Stochastic Gradient Descent applied to linear regression

The following example how to optimize a simple linear regression.


import numpy
from mlstatpy.optim import SGDOptimizer

def fct_loss(c, X, y):
    return numpy.linalg.norm(X @ c - y) ** 2

def fct_grad(c, x, y, i=0):
    return x * (x @ c - y) * 0.1

coef = numpy.array([0.5, 0.6, -0.7])
X = numpy.random.randn(10, 3)
y = X @ coef

sgd = SGDOptimizer(numpy.random.randn(3))
sgd.train(X, y, fct_loss, fct_grad, max_iter=15, verbose=True)
print('optimized coefficients:', sgd.coef)


    0/15: loss: 88.79 lr=0.1 max(coef): 1.9 l1=0/2.9 l2=0/4.5
    1/15: loss: 47.22 lr=0.0302 max(coef): 1.7 l1=0.38/2.4 l2=0.084/3.2
    2/15: loss: 39.7 lr=0.0218 max(coef): 1.4 l1=1.8/3 l2=2.5/3.3
    3/15: loss: 22.29 lr=0.018 max(coef): 0.92 l1=0.0063/2.5 l2=2.3e-05/2.1
    4/15: loss: 10.18 lr=0.0156 max(coef): 0.85 l1=0.01/1.9 l2=3.5e-05/1.4
    5/15: loss: 4.194 lr=0.014 max(coef): 0.76 l1=0.00065/1.6 l2=2.4e-07/1.1
    6/15: loss: 1.646 lr=0.0128 max(coef): 0.71 l1=0.065/1.8 l2=0.0018/1.1
    7/15: loss: 0.7677 lr=0.0119 max(coef): 0.66 l1=0.13/1.8 l2=0.0076/1.1
    8/15: loss: 0.4433 lr=0.0111 max(coef): 0.63 l1=0.095/1.8 l2=0.0042/1.1
    9/15: loss: 0.272 lr=0.0105 max(coef): 0.6 l1=0.00051/1.8 l2=8.8e-08/1.1
    10/15: loss: 0.1774 lr=0.00995 max(coef): 0.6 l1=0.00076/1.8 l2=2e-07/1
    11/15: loss: 0.1309 lr=0.00949 max(coef): 0.61 l1=0.001/1.8 l2=3.4e-07/1
    12/15: loss: 0.1065 lr=0.00909 max(coef): 0.61 l1=0.071/1.7 l2=0.0042/1
    13/15: loss: 0.08566 lr=0.00874 max(coef): 0.62 l1=0.0081/1.7 l2=3.2e-05/1
    14/15: loss: 0.07239 lr=0.00842 max(coef): 0.62 l1=0.06/1.7 l2=0.003/1
    15/15: loss: 0.06085 lr=0.00814 max(coef): 0.63 l1=0.056/1.7 l2=0.0025/1
    optimized coefficients: [ 0.541  0.577 -0.629]

source on GitHub

__init__(coef, learning_rate_init=0.1, lr_schedule='invscaling', momentum=0.9, power_t=0.5, early_th=None, min_threshold=None, max_threshold=None, l1=0.0, l2=0.0)#
_display_progress(it, max_iter, loss, losses=None, msg='loss')#

Displays training progress.


Gets the values used to update params with given gradients.


grad – array, gradient


updates, array, the values to add to params

source on GitHub


Performs updates to learning rate and potential other states at the end of an iteration.


time_step – int number of training samples trained on so far, used to update learning rate for “invscaling”

source on GitHub