.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "gyexamples/plot_constraint_kmeans.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_gyexamples_plot_constraint_kmeans.py: ================= Constraint KMeans ================= Simple example to show how to cluster keeping approximatively the same number of points in every cluster. .. contents:: :local: Data ==== .. GENERATED FROM PYTHON SOURCE LINES 16-37 .. code-block:: default from collections import Counter import numpy import matplotlib.pyplot as plt from sklearn.datasets import make_blobs from sklearn.cluster import KMeans from mlinsights.mlmodel import ConstraintKMeans n_samples = 100 data = make_blobs( n_samples=n_samples, n_features=2, centers=2, cluster_std=1.0, center_box=(-10.0, 0.0), shuffle=True, random_state=2) X1 = data[0] data = make_blobs( n_samples=n_samples // 2, n_features=2, centers=2, cluster_std=1.0, center_box=(0.0, 10.0), shuffle=True, random_state=2) X2 = data[0] X = numpy.vstack([X1, X2]) X.shape .. rst-class:: sphx-glr-script-out .. code-block:: none (150, 2) .. GENERATED FROM PYTHON SOURCE LINES 38-39 Plots. .. GENERATED FROM PYTHON SOURCE LINES 39-44 .. code-block:: default fig, ax = plt.subplots(1, 1, figsize=(4, 4)) ax.plot(X[:, 0], X[:, 1], '.') ax.set_title('4 clusters') .. image-sg:: /gyexamples/images/sphx_glr_plot_constraint_kmeans_001.png :alt: 4 clusters :srcset: /gyexamples/images/sphx_glr_plot_constraint_kmeans_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none Text(0.5, 1.0, '4 clusters') .. GENERATED FROM PYTHON SOURCE LINES 45-47 Standard KMeans =============== .. GENERATED FROM PYTHON SOURCE LINES 47-63 .. code-block:: default km = KMeans(n_clusters=4) km.fit(X) cl = km.predict(X) hist = Counter(cl) colors = 'brgy' fig, ax = plt.subplots(1, 1, figsize=(4, 4)) for i in range(0, max(cl) + 1): ax.plot(X[cl == i, 0], X[cl == i, 1], colors[i] + '.', label='cl%d' % i) x = [km.cluster_centers_[i, 0], km.cluster_centers_[i, 0]] y = [km.cluster_centers_[i, 1], km.cluster_centers_[i, 1]] ax.plot(x, y, colors[i] + '+') ax.set_title(f'KMeans 4 clusters\n{hist!r}') ax.legend() .. image-sg:: /gyexamples/images/sphx_glr_plot_constraint_kmeans_002.png :alt: KMeans 4 clusters Counter({2: 50, 0: 50, 1: 27, 3: 23}) :srcset: /gyexamples/images/sphx_glr_plot_constraint_kmeans_002.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 64-66 Constraint KMeans ================= .. GENERATED FROM PYTHON SOURCE LINES 66-75 .. code-block:: default km1 = ConstraintKMeans(n_clusters=4, strategy='gain', balanced_predictions=True) km1.fit(X) km2 = ConstraintKMeans(n_clusters=4, strategy='distance', balanced_predictions=True) km2.fit(X) .. rst-class:: sphx-glr-script-out .. code-block:: none somewhere/workspace/mlinsights/mlinsights_UT_39_std/_venv/lib/python3.9/site-packages/sklearn/cluster/_kmeans.py:1316: FutureWarning: algorithm='auto' is deprecated, it will be removed in 1.3. Using 'lloyd' instead. warnings.warn( somewhere/workspace/mlinsights/mlinsights_UT_39_std/_venv/lib/python3.9/site-packages/sklearn/cluster/_kmeans.py:1316: FutureWarning: algorithm='auto' is deprecated, it will be removed in 1.3. Using 'lloyd' instead. warnings.warn( .. raw:: html
ConstraintKMeans(balanced_predictions=True, n_clusters=4, strategy='distance')
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


.. GENERATED FROM PYTHON SOURCE LINES 76-78 This algorithm tries to exchange points between clusters. .. GENERATED FROM PYTHON SOURCE LINES 78-103 .. code-block:: default cl1 = km1.predict(X) hist1 = Counter(cl1) cl2 = km2.predict(X) hist2 = Counter(cl2) fig, ax = plt.subplots(1, 2, figsize=(10, 4)) for i in range(0, max(cl1) + 1): ax[0].plot(X[cl1 == i, 0], X[cl1 == i, 1], colors[i] + '.', label='cl%d' % i) ax[1].plot(X[cl2 == i, 0], X[cl2 == i, 1], colors[i] + '.', label='cl%d' % i) x = [km1.cluster_centers_[i, 0], km1.cluster_centers_[i, 0]] y = [km1.cluster_centers_[i, 1], km1.cluster_centers_[i, 1]] ax[0].plot(x, y, colors[i] + '+') x = [km2.cluster_centers_[i, 0], km2.cluster_centers_[i, 0]] y = [km2.cluster_centers_[i, 1], km2.cluster_centers_[i, 1]] ax[1].plot(x, y, colors[i] + '+') ax[0].set_title(f'ConstraintKMeans 4 clusters (gains)\n{hist1!r}') ax[0].legend() ax[1].set_title(f'ConstraintKMeans 4 clusters (distances)\n{hist2!r}') ax[1].legend() .. image-sg:: /gyexamples/images/sphx_glr_plot_constraint_kmeans_003.png :alt: ConstraintKMeans 4 clusters (gains) Counter({2: 39, 1: 37, 3: 37, 0: 37}), ConstraintKMeans 4 clusters (distances) Counter({0: 38, 2: 38, 3: 37, 1: 37}) :srcset: /gyexamples/images/sphx_glr_plot_constraint_kmeans_003.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 104-106 Another algorithm tries to extend the area of attraction of each cluster. .. GENERATED FROM PYTHON SOURCE LINES 106-114 .. code-block:: default km = ConstraintKMeans(n_clusters=4, strategy='weights', max_iter=1000, history=True) km.fit(X) cl = km.predict(X) hist = Counter(cl) .. rst-class:: sphx-glr-script-out .. code-block:: none somewhere/workspace/mlinsights/mlinsights_UT_39_std/_venv/lib/python3.9/site-packages/sklearn/cluster/_kmeans.py:1316: FutureWarning: algorithm='auto' is deprecated, it will be removed in 1.3. Using 'lloyd' instead. warnings.warn( .. GENERATED FROM PYTHON SOURCE LINES 115-116 Let's plot Delaunay edges as well. .. GENERATED FROM PYTHON SOURCE LINES 116-146 .. code-block:: default def plot_delaunay(ax, edges, points): for a, b in edges: ax.plot(points[[a, b], 0], points[[a, b], 1], '--', color="#555555") edges = km.cluster_edges() fig, ax = plt.subplots(1, 2, figsize=(10, 4)) for i in range(0, max(cl) + 1): ax[0].plot(X[cl == i, 0], X[cl == i, 1], colors[i] + '.', label='cl%d' % i) x = [km.cluster_centers_[i, 0], km.cluster_centers_[i, 0]] y = [km.cluster_centers_[i, 1], km.cluster_centers_[i, 1]] ax[0].plot(x, y, colors[i] + '+') ax[0].set_title(f"ConstraintKMeans 4 clusters\nstrategy='weights'\n{hist!r}") ax[0].legend() cls = km.cluster_centers_iter_ ax[1].plot(X[:, 0], X[:, 1], '.', label='X', color='#AAAAAA', ms=3) for i in range(0, max(cl) + 1): ms = numpy.arange( cls.shape[-1]).astype(numpy.float64) / cls.shape[-1] * 50 + 1 ax[1].scatter(cls[i, 0, :], cls[i, 1, :], color=colors[i], s=ms, label='cl%d' % i) plot_delaunay(ax[1], edges, km.cluster_centers_) ax[1].set_title("Centers movement") plt.show() .. image-sg:: /gyexamples/images/sphx_glr_plot_constraint_kmeans_004.png :alt: ConstraintKMeans 4 clusters strategy='weights' Counter({2: 49, 1: 49, 0: 48, 3: 4}), Centers movement :srcset: /gyexamples/images/sphx_glr_plot_constraint_kmeans_004.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 8.263 seconds) .. _sphx_glr_download_gyexamples_plot_constraint_kmeans.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_constraint_kmeans.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_constraint_kmeans.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_