module mlmodel.kmeans_constraint#

Inheritance diagram of mlinsights.mlmodel.kmeans_constraint

Short summary#

module mlinsights.mlmodel.kmeans_constraint

Implémente la classe ConstraintKMeans.

source on GitHub

Classes#

class

truncated documentation

ConstraintKMeans

Defines a constraint k-means. Clusters are modified to have an equal size. The algorithm is initialized …

Properties#

property

truncated documentation

_repr_html_

HTML representation of estimator. This is redundant with the logic of _repr_mimebundle_. The latter should …

Methods#

method

truncated documentation

__init__

cluster_edges

Computes edges between clusters based on a Delaunay

constraint_kmeans

Completes the constraint k-means.

fit

Compute k-means clustering.

predict

Computes the predictions.

score

Returns the distances to all clusters.

transform

Computes the predictions.

Documentation#

Implémente la classe ConstraintKMeans.

source on GitHub

class mlinsights.mlmodel.kmeans_constraint.ConstraintKMeans(n_clusters=8, init='k-means++', n_init=10, max_iter=500, tol=0.0001, verbose=0, random_state=None, copy_x=True, algorithm='auto', balanced_predictions=False, strategy='gain', kmeans0=True, learning_rate=1.0, history=False)#

Bases: KMeans

Defines a constraint k-means. Clusters are modified to have an equal size. The algorithm is initialized with a regular KMeans and continues with a modified version of it.

Computing the predictions offer a choice. The first one is to keep the predictions from the regular k-means algorithm but with the balanced clusters. The second is to compute balanced predictions over the test set. That implies that the predictions for the same observations might change depending on the set it belongs to.

The parameter strategy determines how obseervations should be assigned to a cluster. The value can be:

  • 'distance': observations are ranked by distance to a cluster, the algorithm assigns first point to the closest center unless it reached the maximum size, it deals first with the further point and maps it to the closest center

  • 'gain': follows the algorithm described at

    see Same-size k-Means Variation,

  • 'weights': estimates weights attached to each cluster,

    it weights the distance to each cluster in order to balance the number of points mapped to every cluster, the strategy uses a learning rate.

The first two strategies cannot reach a good compromise without using function _switch_clusters which tries every switch between clusters: two points change clusters. It keeps the number of points and checks that the inertia is reduced.

source on GitHub

Parameters:
  • n_clusters – number of clusters

  • init – used by k-means

  • n_init – used by k-means

  • max_iter – used by k-means

  • tol – used by k-means

  • verbose – used by k-means

  • random_state – used by k-means

  • copy_x – used by k-means

  • algorithm – used by k-means

  • balanced_predictions – produced balanced prediction or the regular ones

  • strategy – strategy or algorithm used to abide by the constraint

  • kmeans0 – if True, applies k-means algorithm first

  • history – keeps centers accress iterations

  • learning_rate – learning rate, used by strategy ‘weights’

source on GitHub

__abstractmethods__ = frozenset({})#
__init__(n_clusters=8, init='k-means++', n_init=10, max_iter=500, tol=0.0001, verbose=0, random_state=None, copy_x=True, algorithm='auto', balanced_predictions=False, strategy='gain', kmeans0=True, learning_rate=1.0, history=False)#
Parameters:
  • n_clusters – number of clusters

  • init – used by k-means

  • n_init – used by k-means

  • max_iter – used by k-means

  • tol – used by k-means

  • verbose – used by k-means

  • random_state – used by k-means

  • copy_x – used by k-means

  • algorithm – used by k-means

  • balanced_predictions – produced balanced prediction or the regular ones

  • strategy – strategy or algorithm used to abide by the constraint

  • kmeans0 – if True, applies k-means algorithm first

  • history – keeps centers accress iterations

  • learning_rate – learning rate, used by strategy ‘weights’

source on GitHub

_abc_impl = <_abc._abc_data object>#
_strategy_value = {'distance', 'gain', 'weights'}#
cluster_edges()#

Computes edges between clusters based on a Delaunay graph.

source on GitHub

constraint_kmeans(X, sample_weight=None, state=None, learning_rate=1.0, history=False, fLOG=None)#

Completes the constraint k-means.

Parameters:
  • X – features

  • sample_weight – sample weight

  • state – state

  • history – keeps evolution of centers

  • fLOG – logging function

source on GitHub

fit(X, y=None, sample_weight=None, fLOG=None)#

Compute k-means clustering.

Parameters:
  • X – array-like or sparse matrix, shape=(n_samples, n_features) Training instances to cluster. It must be noted that the data will be converted to C ordering, which will cause a memory copy if the given data is not C-contiguous.

  • y – Ignored

  • sample_weight – sample weight

  • fLOG – logging function

source on GitHub

predict(X, sample_weight=None)#

Computes the predictions.

Parameters:

X – features.

Returns:

prediction

source on GitHub

score(X, y=None, sample_weight=None)#

Returns the distances to all clusters.

Parameters:
  • X – features

  • y – unused

  • sample_weight – sample weight

Returns:

distances

source on GitHub

transform(X)#

Computes the predictions.

Parameters:

X – features.

Returns:

prediction

source on GitHub