Optimization

BaseOptimizer

class aftercovid.optim.sgd.BaseOptimizer(coef, learning_rate_init=0.1, min_threshold=None, max_threshold=None)[source]

Base stochastic gradient descent optimizer.

Parameters:
  • coef – array, initial coefficient

  • learning_rate_init – float The initial learning rate used. It controls the step-size in updating the weights.

  • min_threshold – coefficients must be higher than min_thresold

  • max_threshold – coefficients must be below than max_thresold

The class holds the following attributes:

  • learning_rate: float, the current learning rate

iteration_ends(time_step)[source]

Performs update to learning rate and potentially other states at the end of an iteration.

train(X, y, fct_loss, fct_grad, max_iter=100, early_th=None, verbose=False)[source]

Optimizes the coefficients.

Parameters:
  • X – datasets (array)

  • y – expected target

  • fct_loss – loss function, signature: f(coef, X, y) -> float

  • fct_grad – gradient function, signature: g(coef, x, y, i) -> array

  • max_iter – number maximum of iteration

  • early_th – stops the training if the error goes below this threshold

  • verbose – display information

Returns:

loss

update_coef(grad)[source]

Updates coefficients with given gradient.

Parameters:

grad – array, gradient

SGDOptimizer

class aftercovid.optim.SGDOptimizer(coef, learning_rate_init=0.1, lr_schedule='constant', momentum=0.9, power_t=0.5, early_th=None, min_threshold=None, max_threshold=None)[source]

Stochastic gradient descent optimizer with momentum.

Parameters:
  • coef – array, initial coefficient

  • learning_rate_init – float The initial learning rate used. It controls the step-size in updating the weights,

  • lr_schedule{‘constant’, ‘adaptive’, ‘invscaling’}, learning rate schedule for weight updates, ‘constant’ for a constant learning rate given by learning_rate_init. ‘invscaling’ gradually decreases the learning rate learning_rate_ at each time step t using an inverse scaling exponent of power_t. learning_rate_ = learning_rate_init / pow(t, power_t), ‘adaptive’, keeps the learning rate constant to learning_rate_init as long as the training keeps decreasing. Each time 2 consecutive epochs fail to decrease the training loss by tol, or fail to increase validation score by tol if ‘early_stopping’ is on, the current learning rate is divided by 5.

  • momentum – float Value of momentum used, must be larger than or equal to 0

  • power_t – double The exponent for inverse scaling learning rate.

  • early_th – stops if the error goes below that threshold

  • min_threshold – lower bound for parameters (can be None)

  • max_threshold – upper bound for parameters (can be None)

The class holds the following attributes:

  • learning_rate: float, the current learning rate

  • velocity*: array, velocity that are used to update params

Stochastic Gradient Descent applied to linear regression

The following example how to optimize a simple linear regression.

<<<

import numpy
from aftercovid.optim import SGDOptimizer


def fct_loss(c, X, y):
    return numpy.linalg.norm(X @ c - y) ** 2


def fct_grad(c, x, y, i=0):
    return x * (x @ c - y) * 0.1


coef = numpy.array([0.5, 0.6, -0.7])
X = numpy.random.randn(10, 3)
y = X @ coef

sgd = SGDOptimizer(numpy.random.randn(3))
sgd.train(X, y, fct_loss, fct_grad, max_iter=15, verbose=True)
print('optimized coefficients:', sgd.coef)

>>>

    0/15: loss: 13.64 lr=0.1
    1/15: loss: 8.428 lr=0.1
    2/15: loss: 5.17 lr=0.1
    3/15: loss: 1.513 lr=0.1
    4/15: loss: 0.6043 lr=0.1
    5/15: loss: 0.2803 lr=0.1
    6/15: loss: 0.1244 lr=0.1
    7/15: loss: 0.04362 lr=0.1
    8/15: loss: 0.02227 lr=0.1
    9/15: loss: 0.00653 lr=0.1
    10/15: loss: 0.002487 lr=0.1
    11/15: loss: 0.0002025 lr=0.1
    12/15: loss: 0.0007033 lr=0.1
    13/15: loss: 0.0002937 lr=0.1
    14/15: loss: 0.0002646 lr=0.1
    15/15: loss: 0.0002461 lr=0.1
    optimized coefficients: [ 0.496  0.6   -0.697]
iteration_ends(time_step)[source]

Performs updates to learning rate and potential other states at the end of an iteration.

Parameters:

time_step – int number of training samples trained on so far, used to update learning rate for ‘invscaling’