module optim.sgd

Inheritance diagram of mlstatpy.optim.sgd

Short summary

module mlstatpy.optim.sgd

Implements simple stochastic gradient optimisation. It is inspired from _stochastic_optimizers.py.

source on GitHub

Classes

class

truncated documentation

BaseOptimizer

Base stochastic gradient descent optimizer.

SGDOptimizer

Stochastic gradient descent optimizer with momentum.

Methods

method

truncated documentation

__init__

__init__

_display_progress

Displays training progress.

_display_progress

Displays training progress.

_evaluate_early_stopping

_evaluate_early_stopping

_get_updates

_get_updates

Gets the values used to update params with given gradients.

_regularize_gradient

Applies regularization.

_regularize_gradient

Applies regularization.

iteration_ends

Performs update to learning rate and potentially other states at the end of an iteration.

iteration_ends

Performs updates to learning rate and potential other states at the end of an iteration.

loss_regularization

loss_regularization

train

Optimizes the coefficients.

train

Optimizes the coefficients.

update_coef

Updates coefficients with given gradient.

update_coef

Updates coefficients with given gradient.

Documentation

Implements simple stochastic gradient optimisation. It is inspired from _stochastic_optimizers.py.

source on GitHub

class mlstatpy.optim.sgd.BaseOptimizer(coef, learning_rate_init=0.1, min_threshold=None, max_threshold=None, l1=0.0, l2=0.0)

Bases : object

Base stochastic gradient descent optimizer.

Paramètres
  • coef – array, initial coefficient

  • learning_rate_init – float The initial learning rate used. It controls the step-size in updating the weights.

  • min_threshold – coefficients must be higher than min_thresold

  • max_threshold – coefficients must be below than max_thresold

The class holds the following attributes:

  • learning_rate: float, the current learning rate

  • coef: optimized coefficients

  • min_threshold, max_threshold: coefficients thresholds

  • l2: L2 regularization

  • l1: L1 regularization

source on GitHub

__init__(coef, learning_rate_init=0.1, min_threshold=None, max_threshold=None, l1=0.0, l2=0.0)
_display_progress(it, max_iter, loss, losses=None, msg=None)

Displays training progress.

_evaluate_early_stopping(it, max_iter, losses, early_th, verbose=False)
_get_updates(grad)
_regularize_gradient(grad)

Applies regularization.

source on GitHub

iteration_ends(time_step)

Performs update to learning rate and potentially other states at the end of an iteration.

source on GitHub

train(X, y, fct_loss, fct_grad, max_iter=100, early_th=None, verbose=False)

Optimizes the coefficients.

Paramètres
  • X – datasets (array)

  • y – expected target

  • fct_loss – loss function, signature: f(coef, X, y) -> float

  • fct_grad – gradient function, signature: g(coef, x, y, i) -> array

  • max_iter – number maximum of iteration

  • early_th – stops the training if the error goes below this threshold

  • verbose – display information

Renvoie

loss

The method keeps the best coefficients for the minimal loss.

source on GitHub

update_coef(grad)

Updates coefficients with given gradient.

Paramètres

grad – array, gradient

source on GitHub

class mlstatpy.optim.sgd.SGDOptimizer(coef, learning_rate_init=0.1, lr_schedule='invscaling', momentum=0.9, power_t=0.5, early_th=None, min_threshold=None, max_threshold=None, l1=0.0, l2=0.0)

Bases : mlstatpy.optim.sgd.BaseOptimizer

Stochastic gradient descent optimizer with momentum.

Paramètres
  • coef – array, initial coefficient

  • learning_rate_init – float The initial learning rate used. It controls the step-size in updating the weights,

  • lr_schedule{“constant”, “adaptive”, “invscaling”}, learning rate schedule for weight updates, “constant” for a constant learning rate given by learning_rate_init. “invscaling” gradually decreases the learning rate learning_rate_ at each time step t using an inverse scaling exponent of power_t. learning_rate_ = learning_rate_init / pow(t, power_t), “adaptive”, keeps the learning rate constant to learning_rate_init as long as the training keeps decreasing. Each time 2 consecutive epochs fail to decrease the training loss by tol, or fail to increase validation score by tol if “early_stopping” is on, the current learning rate is divided by 5.

  • momentum – float Value of momentum used, must be larger than or equal to 0

  • power_t – double The exponent for inverse scaling learning rate.

  • early_th – stops if the error goes below that threshold

  • min_threshold – lower bound for parameters (can be None)

  • max_threshold – upper bound for parameters (can be None)

  • l1 – L1 regularization

  • l2 – L2 regularization

The class holds the following attributes:

  • learning_rate: float, the current learning rate

  • velocity*: array, velocity that are used to update params

Stochastic Gradient Descent applied to linear regression

The following example how to optimize a simple linear regression.

<<<

import numpy
from mlstatpy.optim import SGDOptimizer


def fct_loss(c, X, y):
    return numpy.linalg.norm(X @ c - y) ** 2


def fct_grad(c, x, y, i=0):
    return x * (x @ c - y) * 0.1


coef = numpy.array([0.5, 0.6, -0.7])
X = numpy.random.randn(10, 3)
y = X @ coef

sgd = SGDOptimizer(numpy.random.randn(3))
sgd.train(X, y, fct_loss, fct_grad, max_iter=15, verbose=True)
print('optimized coefficients:', sgd.coef)

>>>

    0/15: loss: 66.62 lr=0.1 max(coef): 2.1 l1=0/3.5 l2=0/6.3
    1/15: loss: 47.39 lr=0.0302 max(coef): 1.6 l1=0.084/3 l2=0.0027/4
    2/15: loss: 28.83 lr=0.0218 max(coef): 1.1 l1=1.2/2.4 l2=0.48/2.1
    3/15: loss: 19.21 lr=0.018 max(coef): 0.78 l1=0.86/2 l2=0.26/1.4
    4/15: loss: 13.44 lr=0.0156 max(coef): 0.89 l1=0.17/1.6 l2=0.012/1.1
    5/15: loss: 10.85 lr=0.014 max(coef): 0.96 l1=0.28/1.4 l2=0.027/1
    6/15: loss: 9.143 lr=0.0128 max(coef): 0.97 l1=0.12/1.2 l2=0.0054/0.96
    7/15: loss: 7.195 lr=0.0119 max(coef): 0.9 l1=0.6/1 l2=0.16/0.81
    8/15: loss: 5.226 lr=0.0111 max(coef): 0.78 l1=0.39/0.78 l2=0.053/0.6
    9/15: loss: 3.933 lr=0.0105 max(coef): 0.69 l1=0.0055/0.81 l2=1.6e-05/0.48
    10/15: loss: 3.275 lr=0.00995 max(coef): 0.63 l1=0.085/0.81 l2=0.0031/0.41
    11/15: loss: 2.856 lr=0.00949 max(coef): 0.6 l1=0.039/0.84 l2=0.00057/0.39
    12/15: loss: 2.593 lr=0.00909 max(coef): 0.59 l1=0.013/0.87 l2=9.2e-05/0.39
    13/15: loss: 2.323 lr=0.00874 max(coef): 0.58 l1=0.082/0.9 l2=0.0023/0.39
    14/15: loss: 2.194 lr=0.00842 max(coef): 0.56 l1=0.012/0.91 l2=5.9e-05/0.38
    15/15: loss: 2.012 lr=0.00814 max(coef): 0.55 l1=0.02/0.93 l2=0.00022/0.38
    optimized coefficients: [ 0.163  0.549 -0.219]

source on GitHub

__init__(coef, learning_rate_init=0.1, lr_schedule='invscaling', momentum=0.9, power_t=0.5, early_th=None, min_threshold=None, max_threshold=None, l1=0.0, l2=0.0)
_display_progress(it, max_iter, loss, losses=None, msg='loss')

Displays training progress.

_get_updates(grad)

Gets the values used to update params with given gradients.

Paramètres

grad – array, gradient

Renvoie

updates, array, the values to add to params

source on GitHub

iteration_ends(time_step)

Performs updates to learning rate and potential other states at the end of an iteration.

Paramètres

time_step – int number of training samples trained on so far, used to update learning rate for “invscaling”

source on GitHub