module mlmodel.kmeans_constraint
#
Short summary#
module mlinsights.mlmodel.kmeans_constraint
Implémente la classe ConstraintKMeans
.
Classes#
class |
truncated documentation |
---|---|
Defines a constraint k-means. Clusters are modified to have an equal size. The algorithm is initialized … |
Properties#
property |
truncated documentation |
---|---|
|
HTML representation of estimator. This is redundant with the logic of _repr_mimebundle_. The latter should … |
Methods#
method |
truncated documentation |
---|---|
Computes edges between clusters based on a Delaunay … |
|
Completes the constraint k-means. |
|
Compute k-means clustering. |
|
Computes the predictions. |
|
Returns the distances to all clusters. |
|
Computes the predictions. |
Documentation#
Implémente la classe ConstraintKMeans
.
- class mlinsights.mlmodel.kmeans_constraint.ConstraintKMeans(n_clusters=8, init='k-means++', n_init=10, max_iter=500, tol=0.0001, verbose=0, random_state=None, copy_x=True, algorithm='auto', balanced_predictions=False, strategy='gain', kmeans0=True, learning_rate=1.0, history=False)#
Bases:
KMeans
Defines a constraint k-means. Clusters are modified to have an equal size. The algorithm is initialized with a regular KMeans and continues with a modified version of it.
Computing the predictions offer a choice. The first one is to keep the predictions from the regular k-means algorithm but with the balanced clusters. The second is to compute balanced predictions over the test set. That implies that the predictions for the same observations might change depending on the set it belongs to.
The parameter strategy determines how obseervations should be assigned to a cluster. The value can be:
'distance'
: observations are ranked by distance to a cluster, the algorithm assigns first point to the closest center unless it reached the maximum size, it deals first with the further point and maps it to the closest center'gain'
: follows the algorithm described at
'weights'
: estimates weights attached to each cluster,it weights the distance to each cluster in order to balance the number of points mapped to every cluster, the strategy uses a learning rate.
The first two strategies cannot reach a good compromise without using function
_switch_clusters
which tries every switch between clusters: two points change clusters. It keeps the number of points and checks that the inertia is reduced.- Parameters:
n_clusters – number of clusters
init – used by k-means
n_init – used by k-means
max_iter – used by k-means
tol – used by k-means
verbose – used by k-means
random_state – used by k-means
copy_x – used by k-means
algorithm – used by k-means
balanced_predictions – produced balanced prediction or the regular ones
strategy – strategy or algorithm used to abide by the constraint
kmeans0 – if True, applies k-means algorithm first
history – keeps centers accress iterations
learning_rate – learning rate, used by strategy ‘weights’
- __abstractmethods__ = frozenset({})#
- __init__(n_clusters=8, init='k-means++', n_init=10, max_iter=500, tol=0.0001, verbose=0, random_state=None, copy_x=True, algorithm='auto', balanced_predictions=False, strategy='gain', kmeans0=True, learning_rate=1.0, history=False)#
- Parameters:
n_clusters – number of clusters
init – used by k-means
n_init – used by k-means
max_iter – used by k-means
tol – used by k-means
verbose – used by k-means
random_state – used by k-means
copy_x – used by k-means
algorithm – used by k-means
balanced_predictions – produced balanced prediction or the regular ones
strategy – strategy or algorithm used to abide by the constraint
kmeans0 – if True, applies k-means algorithm first
history – keeps centers accress iterations
learning_rate – learning rate, used by strategy ‘weights’
- _abc_impl = <_abc._abc_data object>#
- _strategy_value = {'distance', 'gain', 'weights'}#
- constraint_kmeans(X, sample_weight=None, state=None, learning_rate=1.0, history=False, fLOG=None)#
Completes the constraint k-means.
- Parameters:
X – features
sample_weight – sample weight
state – state
history – keeps evolution of centers
fLOG – logging function
- fit(X, y=None, sample_weight=None, fLOG=None)#
Compute k-means clustering.
- Parameters:
X – array-like or sparse matrix, shape=(n_samples, n_features) Training instances to cluster. It must be noted that the data will be converted to C ordering, which will cause a memory copy if the given data is not C-contiguous.
y – Ignored
sample_weight – sample weight
fLOG – logging function
- predict(X, sample_weight=None)#
Computes the predictions.
- Parameters:
X – features.
- Returns:
prediction
- score(X, y=None, sample_weight=None)#
Returns the distances to all clusters.
- Parameters:
X – features
y – unused
sample_weight – sample weight
- Returns:
distances
- transform(X)#
Computes the predictions.
- Parameters:
X – features.
- Returns:
prediction