2019-02 - 1/1 Blog machine_learning (5)

2019-02 - 1/1#

Faster Polynomial Features#

2019-02-15

The current implementation of PolynomialFeatures in scikit-learn computes each new feature independently and that increases the number of data exchanged between numpy and Python. The idea of the implementation in ExtendedFeatures is to reduce this number by brodcast multiplications. The second optimization occurs by transposing the matrix: dense matrix are organized by rows in memory so it is faster to mulitply two rows than two columns. See Faster Polynomial Features.

post

Piecewise Linear Regression#

2019-02-10

I decided to turn one of the notebook I wrote about Piecewise Linear Regression. I wanted to turn my code into something usable and following the scikit-learn API: PiecewiseRegression and another notebook Piecewise linear regression with scikit-learn predictors.

post

Predictable t-SNE#

2019-02-01

t-SNE is quite an interesting tool to visualize data on a map but it has one drawback: results are not reproducible. It is much more powerful than a PCA but the results is difficult to interpret. Based on some experiment, if t-SNE manages to separate classes, there is a good chance that a classifier can get good performances. Anyhow, I implemented a regressor which approximates the t-SNE outputs so that it can be used as features for a further classifier. I create a notebook Predictable t-SNE and a new tranform PredictableTSNE.

post

Pipeline visualization#

2019-02-01

scikit-learn introduced nice feature to be able to process mixed type column in a single pipeline which follows scikit-learn API: ColumnTransformer FeatureUnion and Pipeline. Ideas are not new but it is finally taking place in scikit-learn.

…

post

2019-02 - 1/1 2018-05 (1) 2019-02 (4) 2019-03 (1) 2020-09 (1) 2021-01 (1)