**blog page - 1/1** Blog machine_learning (5)

# blog page - 1/1¶

## scikit-learn 0.23¶

2021-01-03

The unit test are run against
scikit-learn 0.23, 0.24.
Some unit tests are failing with version 0.23.
They were disabled instead of looking into a cause
which does not appear with the latest version.
It affects all classes inheriting from `SkBase`

where a model
using it is trained. The issue happens in :epkg:`joblib`.

## scikit-learn internal API¶

2020-09-02

The signature of method impurity_improvement will change for version 0.24. That’s usually easy to handle two versions of scikit-learn even overloaded in a class except that method is implemented in cython. The method must be overloaded the same way with the same signature. The way it was handled is implemented in PR 88.

…

## Nogil, numpy, cython¶

2019-03-25

I had to implement a custom criterion to optimize
a decision tree and I wanted to leverage scikit-learn
instead of rewriting my own. Version 0.21 of scikit-learn
introduced some changed in the API which make possible
to overload an existing criterion and replace some of the logic
by another one: _criterion.pyx.
The purpose was to show that a fast implementation requires
some tricks (see Custom DecisionTreeRegressor adapted to a linear regression) and
piecewise_tree_regression_criterion.pyx,
piecewise_tree_regression_criterion_fast.pyx
for the code. Other than that, every function to overlaod is marked as
nogil. Every function or method marked as *nogil* cannot
go through the GIL (see also PEP-0311),
which no python object can be created in that method.
In fact, no python can be called inside a Cython
method protected with *nogil*. The issue with that is that
any numpy method cannot be called.

…

## Faster Polynomial Features¶

2019-02-15

The current implementation of
PolynomialFeatures
in *scikit-learn* computes each new feature
independently and that increases the number of
data exchanged between *numpy* and *Python*.
The idea of the implementation in
`ExtendedFeatures`

is to reduce this number by brodcast multiplications.
The second optimization occurs by transposing the matrix:
dense matrix are organized by rows in memory so
it is faster to mulitply two rows than two columns.
See Faster Polynomial Features.

## Piecewise Linear Regression¶

2019-02-10

I decided to turn one of the notebook I wrote about
Piecewise Linear Regression.
I wanted to turn my code into something usable and following
the *scikit-learn* API:
`PiecewiseRegression`

and another notebook Piecewise linear regression with scikit-learn predictors.

## Predictable t-SNE¶

2019-02-01

t-SNE is quite an interesting tool to
visualize data on a map but it has one drawback:
results are not reproducible. It is much more powerful
than a PCA but the results is difficult to
interpret. Based on some experiment, if t-SNE
manages to separate classes, there is a good chance that
a classifier can get good performances. Anyhow, I implemented
a regressor which approximates the t-SNE outputs
so that it can be used as features for a further classifier.
I create a notebook Predictable t-SNE and a new tranform
`PredictableTSNE`

.

## Pipeline visualization¶

2019-02-01

scikit-learn introduced nice feature to be able to process mixed type column in a single pipeline which follows scikit-learn API: ColumnTransformer FeatureUnion and Pipeline. Ideas are not new but it is finally taking place in scikit-learn.

…

## Quantile regression with scikit-learn.¶

2018-05-07

scikit-learn does not have any quantile regression.
statsmodels does have one
QuantReg
but I wanted to try something I did for my teachings
Régression Quantile
based on Iteratively reweighted least squares.
I thought it was a good case study to turn a simple algorithm into
a learner scikit-learn can reused in a pipeline.
The notebook Quantile Regression demonstrates it
and it is implemented in
`QuantileLinearRegression`

.

## Function to get insights on machine learned models¶

2017-11-18

Machine learned models are black boxes. The module tries to implements some functions to get insights on machine learned models.

**blog page - 1/1** 2018-05 (1) 2019-02 (4) 2019-03 (1) 2020-09 (1) 2021-01 (1)