module `mlmodel.kmeans_constraint`#

Short summary#

module mlinsights.mlmodel.kmeans_constraint

Implémente la classe ConstraintKMeans.

Classes#

class	truncated documentation
`ConstraintKMeans`	Defines a constraint k-means. Clusters are modified to have an equal size. The algorithm is initialized …

Properties#

property	truncated documentation
`_repr_html_`	HTML representation of estimator. This is redundant with the logic of _repr_mimebundle_. The latter should …

Methods#

method	truncated documentation
`__init__`
`cluster_edges`	Computes edges between clusters based on a Delaunay …
`constraint_kmeans`	Completes the constraint k-means.
`fit`	Compute k-means clustering.
`predict`	Computes the predictions.
`score`	Returns the distances to all clusters.
`transform`	Computes the predictions.

Documentation#

Implémente la classe ConstraintKMeans.

source on GitHub

class mlinsights.mlmodel.kmeans_constraint.ConstraintKMeans(n_clusters=8, init='k-means++', n_init=10, max_iter=500, tol=0.0001, verbose=0, random_state=None, copy_x=True, algorithm='auto', balanced_predictions=False, strategy='gain', kmeans0=True, learning_rate=1.0, history=False)#

Bases: KMeans

Defines a constraint k-means. Clusters are modified to have an equal size. The algorithm is initialized with a regular KMeans and continues with a modified version of it.

Computing the predictions offer a choice. The first one is to keep the predictions from the regular k-means algorithm but with the balanced clusters. The second is to compute balanced predictions over the test set. That implies that the predictions for the same observations might change depending on the set it belongs to.

The parameter strategy determines how obseervations should be assigned to a cluster. The value can be:

'distance': observations are ranked by distance to a cluster, the algorithm assigns first point to the closest center unless it reached the maximum size, it deals first with the further point and maps it to the closest center
'gain': follows the algorithm described at
see Same-size k-Means Variation,
'weights': estimates weights attached to each cluster,
it weights the distance to each cluster in order to balance the number of points mapped to every cluster, the strategy uses a learning rate.

The first two strategies cannot reach a good compromise without using function _switch_clusters which tries every switch between clusters: two points change clusters. It keeps the number of points and checks that the inertia is reduced.

source on GitHub

Parameters:

n_clusters – number of clusters
init – used by k-means
n_init – used by k-means
max_iter – used by k-means
tol – used by k-means
verbose – used by k-means
random_state – used by k-means
copy_x – used by k-means
algorithm – used by k-means
balanced_predictions – produced balanced prediction or the regular ones
strategy – strategy or algorithm used to abide by the constraint
kmeans0 – if True, applies k-means algorithm first
history – keeps centers accress iterations
learning_rate – learning rate, used by strategy ‘weights’

source on GitHub

__abstractmethods__ = frozenset({})#

__init__(n_clusters=8, init='k-means++', n_init=10, max_iter=500, tol=0.0001, verbose=0, random_state=None, copy_x=True, algorithm='auto', balanced_predictions=False, strategy='gain', kmeans0=True, learning_rate=1.0, history=False)#

Parameters:

n_clusters – number of clusters
init – used by k-means
n_init – used by k-means
max_iter – used by k-means
tol – used by k-means
verbose – used by k-means
random_state – used by k-means
copy_x – used by k-means
algorithm – used by k-means
balanced_predictions – produced balanced prediction or the regular ones
strategy – strategy or algorithm used to abide by the constraint
kmeans0 – if True, applies k-means algorithm first
history – keeps centers accress iterations
learning_rate – learning rate, used by strategy ‘weights’

source on GitHub

_abc_impl = <_abc._abc_data object>#

_strategy_value = {'distance', 'gain', 'weights'}#

cluster_edges()#

Computes edges between clusters based on a Delaunay graph.

source on GitHub

constraint_kmeans(X, sample_weight=None, state=None, learning_rate=1.0, history=False, fLOG=None)#

Completes the constraint k-means.

Parameters:

X – features
sample_weight – sample weight
state – state
history – keeps evolution of centers
fLOG – logging function

source on GitHub

fit(X, y=None, sample_weight=None, fLOG=None)#

Compute k-means clustering.

Parameters:

X – array-like or sparse matrix, shape=(n_samples, n_features) Training instances to cluster. It must be noted that the data will be converted to C ordering, which will cause a memory copy if the given data is not C-contiguous.
y – Ignored
sample_weight – sample weight
fLOG – logging function

source on GitHub

predict(X, sample_weight=None)#

Computes the predictions.