module mlmodel._kmeans_constraint_#

Short summary#

module mlinsights.mlmodel._kmeans_constraint_

Implémente la classe ConstraintKMeans.

source on GitHub

Functions#

function

truncated documentation

_adjust_weights

Changes weights mapped to every cluster. weights < 1 are used for big clusters, weights > 1 are used for small …

_compute_balance

Computes weights difference.

_compute_strategy_coefficient

Creates a matrix

_constraint_association

Completes the constraint k-means.

_constraint_association_distance

Completes the constraint k-means, the function sorts points by distance to the closest cluster and associates …

_constraint_association_gain

Completes the constraint k-means. Follows the method described in Same-size k-Means Variation. …

_constraint_association_weights

Associates points to clusters.

_constraint_kmeans_weights

Runs KMeans iterator but weights cluster among them.

_inertia

Computes total weighted inertia.

_labels_inertia_weights

Computes weighted inertia. It also adds a fraction of the whole inertia depending on how balanced the clusters are. …

_randomize_index

Randomizes index depending on the value. Swap indexes. Modifies index.

_switch_clusters

Tries to switch clusters. Modifies labels inplace.

constraint_kmeans

Completes the constraint k-means.

constraint_predictions

Computes the predictions but tries to associates the same numbers of points in each cluster.

linearize_matrix

Linearizes a matrix into a new one with 3 columns value, row, column. The output format is similar to :epkg:`csr_matrix`

Documentation#

Implémente la classe ConstraintKMeans.

source on GitHub

mlinsights.mlmodel._kmeans_constraint_._adjust_weights(X, sw, weights, labels, lr)#

Changes weights mapped to every cluster. weights < 1 are used for big clusters, weights > 1 are used for small clusters.

Parameters:
  • X – features

  • centers – centers

  • sw – sample weights

  • weights – cluster weights

  • lr – learning rate

  • labels – known labels

Returns:

labels

source on GitHub

mlinsights.mlmodel._kmeans_constraint_._compute_balance(X, sw, labels, nbc=None)#

Computes weights difference.

Parameters:
  • X – features

  • sw – sample weights

  • labels – known labels

  • nbc – number of clusters

Returns:

(weights per cluster, expected weight, total weight)

source on GitHub

mlinsights.mlmodel._kmeans_constraint_._compute_strategy_coefficient(distances, strategy, labels)#

Creates a matrix

source on GitHub

mlinsights.mlmodel._kmeans_constraint_._constraint_association(leftover, counters, labels, leftclose, distances_close, centers, X, x_squared_norms, limit, strategy, state=None)#

Completes the constraint k-means.

Parameters:
  • X – features

  • labels – initialized labels (unused)

  • centers – initialized centers

  • x_squared_norms – norm of X

  • limit – number of point to associate per cluster

  • leftover – number of points to associate at the end

  • counters – allocated array

  • leftclose – allocated array

  • labels – allocated array

  • distances_close – allocated array

  • strategy – strategy used to sort point before mapping them to a cluster

  • state – random state

source on GitHub

mlinsights.mlmodel._kmeans_constraint_._constraint_association_distance(leftover, counters, labels, leftclose, distances_close, centers, X, x_squared_norms, limit, strategy, state=None)#

Completes the constraint k-means, the function sorts points by distance to the closest cluster and associates them into that order. It deals first with the further point and maps it to the closest center.

Parameters:
  • X – features

  • labels – initialized labels (unused)

  • centers – initialized centers

  • x_squared_norms – norm of X

  • limit – number of point to associate per cluster

  • leftover – number of points to associate at the end

  • counters – allocated array

  • leftclose – allocated array

  • labels – allocated array

  • distances_close – allocated array

  • strategy – strategy used to sort point before mapping them to a cluster

  • state – random state (unused)

source on GitHub

mlinsights.mlmodel._kmeans_constraint_._constraint_association_gain(leftover, counters, labels, leftclose, distances_close, centers, X, x_squared_norms, limit, strategy, state=None)#

Completes the constraint k-means. Follows the method described in Same-size k-Means Variation.

Parameters:
  • X – features

  • labels – initialized labels (unused)

  • centers – initialized centers

  • x_squared_norms – norm of X

  • limit – number of points to associate per cluster

  • leftover – number of points to associate at the end

  • counters – allocated array

  • leftclose – allocated array

  • labels – allocated array

  • distances_close – allocated array

  • strategy – strategy used to sort point before mapping them to a cluster

  • state – random state

See Same-size k-Means Variation.

source on GitHub

mlinsights.mlmodel._kmeans_constraint_._constraint_association_weights(X, centers, sw, weights)#

Associates points to clusters.

Parameters:
  • X – features

  • centers – centers

  • sw – sample weights

  • weights – cluster weights

Returns:

labels

source on GitHub

mlinsights.mlmodel._kmeans_constraint_._constraint_kmeans_weights(X, labels, sample_weight, centers, inertia, it, max_iter, verbose=0, state=None, learning_rate=1.0, history=False, fLOG=None)#

Runs KMeans iterator but weights cluster among them.

Parameters:
  • X – features

  • labels – initialized labels (unused)

  • sample_weight – sample weight

  • centers – initialized centers

  • inertia – initialized inertia (unused)

  • it – number of iteration already done

  • max_iter – maximum of number of iteration

  • verbose – verbose

  • state – random state

  • learning_rate – learning rate

  • history – keeps all centers accross iterations

  • fLOG – logging function (needs to be specified otherwise verbose has no effects)

Returns:

tuple (best_labels, best_centers, best_inertia, weights, it)

source on GitHub

mlinsights.mlmodel._kmeans_constraint_._inertia(X, sw)#

Computes total weighted inertia.

Parameters:
  • X – features

  • sw – sample weights

Returns:

inertia

source on GitHub

mlinsights.mlmodel._kmeans_constraint_._labels_inertia_weights(X, centers, sw, weights, labels, total_inertia)#

Computes weighted inertia. It also adds a fraction of the whole inertia depending on how balanced the clusters are.

Parameters:
  • X – features

  • centers – centers

  • sw – sample weights

  • weights – cluster weights

  • labels – labels

  • total_inertia – total inertia

Returns:

inertia

source on GitHub

mlinsights.mlmodel._kmeans_constraint_._randomize_index(index, weights)#

Randomizes index depending on the value. Swap indexes. Modifies index.

source on GitHub

mlinsights.mlmodel._kmeans_constraint_._switch_clusters(labels, distances)#

Tries to switch clusters. Modifies labels inplace.

Parameters:
  • labels – labels

  • distances – distances

source on GitHub

mlinsights.mlmodel._kmeans_constraint_.constraint_kmeans(X, labels, sample_weight, centers, inertia, iter, max_iter, strategy='gain', verbose=0, state=None, learning_rate=1.0, history=False, fLOG=None)#

Completes the constraint k-means.

Parameters:
  • X – features

  • labels – initialized labels (unused)

  • sample_weight – sample weight

  • centers – initialized centers

  • inertia – initialized inertia (unused)

  • iter – number of iteration already done

  • max_iter – maximum of number of iteration

  • strategy – strategy used to sort observations before mapping them to clusters

  • verbose – verbose

  • state – random state

  • learning_rate – used by strategy ‘weights’

  • history – return list of centers accross iterations

  • fLOG – logging function (needs to be specified otherwise verbose has no effects)

Returns:

tuple (best_labels, best_centers, best_inertia, iter, all_centers)

source on GitHub

mlinsights.mlmodel._kmeans_constraint_.constraint_predictions(X, centers, strategy, state=None)#

Computes the predictions but tries to associates the same numbers of points in each cluster.

Parameters:
  • X – features

  • centers – centers of each clusters

  • strategy – strategy used to sort point before mapping them to a cluster

  • state – random state

Returns:

labels, distances, distances_close

source on GitHub

mlinsights.mlmodel._kmeans_constraint_.linearize_matrix(mat, *adds)#

Linearizes a matrix into a new one with 3 columns value, row, column. The output format is similar to :epkg:`csr_matrix` but null values are kept.

Parameters:
  • mat – matrix

  • adds – additional square matrices

Returns:

new matrix

adds defines additional matrices, it adds columns on the right side and fill them with the corresponding value taken into the additional matrices.

source on GitHub