.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "gyexamples/plot_digitize.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_gyexamples_plot_digitize.py: .. _l-example-digitize: ======================== numpy.digitize as a tree ======================== .. index:: digitize, decision tree, onnx, onnxruntime Function :epkg:`numpy:digitize` transforms a real variable into a discrete one by returning the buckets the variable falls into. This bucket can be efficiently retrieved by doing a binary search over the bins. That's equivalent to decision tree. Function :func:`digitize2tree `. .. contents:: :local: Simple example ============== .. GENERATED FROM PYTHON SOURCE LINES 24-42 .. code-block:: default import warnings import numpy from pandas import DataFrame, pivot, pivot_table import matplotlib.pyplot as plt from onnxruntime import InferenceSession from sklearn.tree import export_text from skl2onnx import to_onnx from cpyquickhelper.numbers.speed_measure import measure_time from mlinsights.mltree import digitize2tree from tqdm import tqdm x = numpy.array([0.2, 6.4, 3.0, 1.6]) bins = numpy.array([0.0, 1.0, 2.5, 4.0, 7.0]) expected = numpy.digitize(x, bins, right=True) tree = digitize2tree(bins, right=True) pred = tree.predict(x.reshape((-1, 1))) print(expected, pred) .. rst-class:: sphx-glr-script-out .. code-block:: none [1 4 3 2] [1. 4. 3. 2.] .. GENERATED FROM PYTHON SOURCE LINES 43-44 The tree looks like the following. .. GENERATED FROM PYTHON SOURCE LINES 44-46 .. code-block:: default print(export_text(tree, feature_names=['x'])) .. rst-class:: sphx-glr-script-out .. code-block:: none |--- x <= 2.50 | |--- x <= 1.00 | | |--- x <= 0.00 | | | |--- value: [0.00] | | |--- x > 0.00 | | | |--- value: [1.00] | |--- x > 1.00 | | |--- value: [2.00] |--- x > 2.50 | |--- x <= 4.00 | | |--- x <= 2.50 | | | |--- value: [2.00] | | |--- x > 2.50 | | | |--- value: [3.00] | |--- x > 4.00 | | |--- x <= 7.00 | | | |--- x <= 4.00 | | | | |--- value: [3.00] | | | |--- x > 4.00 | | | | |--- value: [4.00] | | |--- x > 7.00 | | | |--- value: [5.00] .. GENERATED FROM PYTHON SOURCE LINES 47-54 Benchmark ========= Let's measure the processing time. *numpy* should be much faster than *scikit-learn* as it is adding many verification. However, the benchmark also includes a conversion of the tree into ONNX and measure the processing time with :epkg:`onnxruntime`. .. GENERATED FROM PYTHON SOURCE LINES 54-109 .. code-block:: default obs = [] for shape in tqdm([1, 10, 100, 1000, 10000, 100000]): x = numpy.random.random(shape).astype(numpy.float32) if shape < 1000: repeat = number = 100 else: repeat = number = 10 for n_bins in [1, 10, 100]: bins = (numpy.arange(n_bins) / n_bins).astype(numpy.float32) ti = measure_time( "numpy.digitize(x, bins, right=True)", context={'numpy': numpy, "x": x, "bins": bins}, div_by_number=True, repeat=repeat, number=number) ti['name'] = 'numpy' ti['n_bins'] = n_bins ti['shape'] = shape obs.append(ti) tree = digitize2tree(bins, right=True) ti = measure_time( "tree.predict(x)", context={'numpy': numpy, "x": x.reshape((-1, 1)), "tree": tree}, div_by_number=True, repeat=repeat, number=number) ti['name'] = 'sklearn' ti['n_bins'] = n_bins ti['shape'] = shape obs.append(ti) with warnings.catch_warnings(): warnings.simplefilter("ignore", category=FutureWarning) onx = to_onnx(tree, x.reshape((-1, 1)), target_opset=15) sess = InferenceSession(onx.SerializeToString()) ti = measure_time( "sess.run(None, {'X': x})", context={'numpy': numpy, "x": x.reshape((-1, 1)), "sess": sess}, div_by_number=True, repeat=repeat, number=number) ti['name'] = 'ort' ti['n_bins'] = n_bins ti['shape'] = shape obs.append(ti) df = DataFrame(obs) piv = pivot_table(data=df, index="shape", columns=["n_bins", "name"], values=["average"]) print(piv) .. rst-class:: sphx-glr-script-out .. code-block:: none 0%| | 0/6 [00:00` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_digitize.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_