Optimization¶
BaseOptimizer¶
- class aftercovid.optim.sgd.BaseOptimizer(coef, learning_rate_init=0.1, min_threshold=None, max_threshold=None)[source]¶
Base stochastic gradient descent optimizer.
- Parameters:
coef – array, initial coefficient
learning_rate_init – float The initial learning rate used. It controls the step-size in updating the weights.
min_threshold – coefficients must be higher than min_thresold
max_threshold – coefficients must be below than max_thresold
The class holds the following attributes:
learning_rate: float, the current learning rate
- iteration_ends(time_step)[source]¶
Performs update to learning rate and potentially other states at the end of an iteration.
- train(X, y, fct_loss, fct_grad, max_iter=100, early_th=None, verbose=False)[source]¶
Optimizes the coefficients.
- Parameters:
X – datasets (array)
y – expected target
fct_loss – loss function, signature: f(coef, X, y) -> float
fct_grad – gradient function, signature: g(coef, x, y, i) -> array
max_iter – number maximum of iteration
early_th – stops the training if the error goes below this threshold
verbose – display information
- Returns:
loss
SGDOptimizer¶
- class aftercovid.optim.SGDOptimizer(coef, learning_rate_init=0.1, lr_schedule='constant', momentum=0.9, power_t=0.5, early_th=None, min_threshold=None, max_threshold=None)[source]¶
Stochastic gradient descent optimizer with momentum.
- Parameters:
coef – array, initial coefficient
learning_rate_init – float The initial learning rate used. It controls the step-size in updating the weights,
lr_schedule – {‘constant’, ‘adaptive’, ‘invscaling’}, learning rate schedule for weight updates, ‘constant’ for a constant learning rate given by learning_rate_init. ‘invscaling’ gradually decreases the learning rate learning_rate_ at each time step t using an inverse scaling exponent of power_t. learning_rate_ = learning_rate_init / pow(t, power_t), ‘adaptive’, keeps the learning rate constant to learning_rate_init as long as the training keeps decreasing. Each time 2 consecutive epochs fail to decrease the training loss by tol, or fail to increase validation score by tol if ‘early_stopping’ is on, the current learning rate is divided by 5.
momentum – float Value of momentum used, must be larger than or equal to 0
power_t – double The exponent for inverse scaling learning rate.
early_th – stops if the error goes below that threshold
min_threshold – lower bound for parameters (can be None)
max_threshold – upper bound for parameters (can be None)
The class holds the following attributes:
learning_rate: float, the current learning rate
velocity*: array, velocity that are used to update params
Stochastic Gradient Descent applied to linear regression
The following example how to optimize a simple linear regression.
<<<
import numpy from aftercovid.optim import SGDOptimizer def fct_loss(c, X, y): return numpy.linalg.norm(X @ c - y) ** 2 def fct_grad(c, x, y, i=0): return x * (x @ c - y) * 0.1 coef = numpy.array([0.5, 0.6, -0.7]) X = numpy.random.randn(10, 3) y = X @ coef sgd = SGDOptimizer(numpy.random.randn(3)) sgd.train(X, y, fct_loss, fct_grad, max_iter=15, verbose=True) print('optimized coefficients:', sgd.coef)
>>>
0/15: loss: 20.66 lr=0.1 1/15: loss: 3.252 lr=0.1 2/15: loss: 1.241 lr=0.1 3/15: loss: 0.2116 lr=0.1 4/15: loss: 0.2207 lr=0.1 5/15: loss: 0.09364 lr=0.1 6/15: loss: 0.006147 lr=0.1 7/15: loss: 0.0135 lr=0.1 8/15: loss: 0.009491 lr=0.1 9/15: loss: 0.003145 lr=0.1 10/15: loss: 0.001171 lr=0.1 11/15: loss: 0.0002995 lr=0.1 12/15: loss: 0.0002034 lr=0.1 13/15: loss: 2.833e-05 lr=0.1 14/15: loss: 1.762e-05 lr=0.1 15/15: loss: 1.887e-06 lr=0.1 optimized coefficients: [ 0.5 0.6 -0.7]