.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "gyexamples/plot_bbegin_measure_time.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_gyexamples_plot_bbegin_measure_time.py: Benchmark ONNX conversion ========================= .. index:: benchmark Example :ref:`l-simple-deploy-1` converts a simple model. This example takes a similar example but on random data and compares the processing time required by each option to compute predictions. .. contents:: :local: Training a pipeline +++++++++++++++++++ .. GENERATED FROM PYTHON SOURCE LINES 19-48 .. code-block:: default import numpy from pandas import DataFrame from tqdm import tqdm from sklearn import config_context from sklearn.datasets import make_regression from sklearn.ensemble import ( GradientBoostingRegressor, RandomForestRegressor, VotingRegressor) from sklearn.linear_model import LinearRegression from sklearn.model_selection import train_test_split from mlprodict.onnxrt import OnnxInference from onnxruntime import InferenceSession from skl2onnx import to_onnx from onnxcustom.utils import measure_time N = 11000 X, y = make_regression(N, n_features=10) X_train, X_test, y_train, y_test = train_test_split( X, y, train_size=0.01) print("Train shape", X_train.shape) print("Test shape", X_test.shape) reg1 = GradientBoostingRegressor(random_state=1) reg2 = RandomForestRegressor(random_state=1) reg3 = LinearRegression() ereg = VotingRegressor([('gb', reg1), ('rf', reg2), ('lr', reg3)]) ereg.fit(X_train, y_train) .. rst-class:: sphx-glr-script-out .. code-block:: none Train shape (110, 10) Test shape (10890, 10) .. raw:: html

VotingRegressor(estimators=[('gb', GradientBoostingRegressor(random_state=1)),
                                ('rf', RandomForestRegressor(random_state=1)),
                                ('lr', LinearRegression())])

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

.. GENERATED FROM PYTHON SOURCE LINES 49-59 Measure the processing time +++++++++++++++++++++++++++ We use function :func:`measure_time `. The page about `assume_finite `_ may be useful if you need to optimize the prediction. We measure the processing time per observation whether or not an observation belongs to a batch or is a single one. .. GENERATED FROM PYTHON SOURCE LINES 59-76 .. code-block:: default sizes = [(1, 50), (10, 50), (1000, 10), (10000, 5)] with config_context(assume_finite=True): obs = [] for batch_size, repeat in tqdm(sizes): context = {"ereg": ereg, 'X': X_test[:batch_size]} mt = measure_time( "ereg.predict(X)", context, div_by_number=True, number=10, repeat=repeat) mt['size'] = context['X'].shape[0] mt['mean_obs'] = mt['average'] / mt['size'] obs.append(mt) df_skl = DataFrame(obs) df_skl .. rst-class:: sphx-glr-script-out .. code-block:: none 0%| | 0/4 [00:00

	average	deviation	min_exec	max_exec	repeat	number	size	mean_obs
0	0.029989	0.000107	0.029839	0.030542	50	10	1	0.029989
1	0.029418	0.000125	0.029245	0.030009	50	10	10	0.002942
2	0.046801	0.003460	0.043296	0.053204	10	10	1000	0.000047
3	0.162123	0.000081	0.162050	0.162258	5	10	10000	0.000016

.. GENERATED FROM PYTHON SOURCE LINES 77-78 Graphe. .. GENERATED FROM PYTHON SOURCE LINES 78-82 .. code-block:: default df_skl.set_index('size')[['mean_obs']].plot( title="scikit-learn", logx=True, logy=True) .. image-sg:: /gyexamples/images/sphx_glr_plot_bbegin_measure_time_001.png :alt: scikit-learn :srcset: /gyexamples/images/sphx_glr_plot_bbegin_measure_time_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 83-88 ONNX runtime ++++++++++++ The same is done with the two ONNX runtime available. .. GENERATED FROM PYTHON SOURCE LINES 88-127 .. code-block:: default onx = to_onnx(ereg, X_train[:1].astype(numpy.float32), target_opset={'': 14, 'ai.onnx.ml': 2}) sess = InferenceSession(onx.SerializeToString(), providers=['CPUExecutionProvider']) oinf = OnnxInference(onx, runtime="python_compiled") obs = [] for batch_size, repeat in tqdm(sizes): # scikit-learn context = {"ereg": ereg, 'X': X_test[:batch_size].astype(numpy.float32)} mt = measure_time( "ereg.predict(X)", context, div_by_number=True, number=10, repeat=repeat) mt['size'] = context['X'].shape[0] mt['skl'] = mt['average'] / mt['size'] # onnxruntime context = {"sess": sess, 'X': X_test[:batch_size].astype(numpy.float32)} mt2 = measure_time( "sess.run(None, {'X': X})[0]", context, div_by_number=True, number=10, repeat=repeat) mt['ort'] = mt2['average'] / mt['size'] # mlprodict context = {"oinf": oinf, 'X': X_test[:batch_size].astype(numpy.float32)} mt2 = measure_time( "oinf.run({'X': X})['variable']", context, div_by_number=True, number=10, repeat=repeat) mt['pyrt'] = mt2['average'] / mt['size'] # end obs.append(mt) df = DataFrame(obs) df .. rst-class:: sphx-glr-script-out .. code-block:: none 0%| | 0/4 [00:00

	average	deviation	min_exec	max_exec	repeat	number	size	skl	ort	pyrt
0	0.030600	0.000161	0.030286	0.031162	50	10	1	0.030600	0.000234	0.009540
1	0.030283	0.000084	0.030121	0.030600	50	10	10	0.003028	0.000057	0.001440
2	0.045175	0.001052	0.044146	0.047618	10	10	1000	0.000045	0.000006	0.000244
3	0.164668	0.000791	0.163987	0.166081	5	10	10000	0.000016	0.000003	0.000145

.. GENERATED FROM PYTHON SOURCE LINES 128-129 Graph. .. GENERATED FROM PYTHON SOURCE LINES 129-134 .. code-block:: default df.set_index('size')[['skl', 'ort', 'pyrt']].plot( title="Average prediction time per runtime", logx=True, logy=True) .. image-sg:: /gyexamples/images/sphx_glr_plot_bbegin_measure_time_002.png :alt: Average prediction time per runtime :srcset: /gyexamples/images/sphx_glr_plot_bbegin_measure_time_002.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none .. GENERATED FROM PYTHON SOURCE LINES 135-141 :epkg:`ONNX` runtimes are much faster than :epkg:`scikit-learn` to predict one observation. :epkg:`scikit-learn` is optimized for training, for batch prediction. That explains why :epkg:`scikit-learn` and ONNX runtimes seem to converge for big batches. They use similar implementation, parallelization and languages (:epkg:`C++`, :epkg:`openmp`). .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 3 minutes 22.568 seconds) .. _sphx_glr_download_gyexamples_plot_bbegin_measure_time.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_bbegin_measure_time.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_bbegin_measure_time.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_