.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "gyexamples/plot_profile_ort_onnx.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_gyexamples_plot_profile_ort_onnx.py: .. _l-profile-ort-onnx: Profiling of ONNX graph with onnxruntime ======================================== This example shows to profile the execution of an ONNX file with :epkg:`onnxruntime` to find the operators which consume most of the time. The script assumes the first dimension, if left unknown, is the batch dimension. .. contents:: :local: One ONNX file +++++++++++++ This section creates an ONNX graph if there is not one. .. GENERATED FROM PYTHON SOURCE LINES 21-39 .. code-block:: default import os import json from collections import OrderedDict import numpy import onnx import matplotlib.pyplot as plt from mpl_toolkits.axes_grid1.axes_divider import make_axes_area_auto_adjustable import pandas from onnxruntime import InferenceSession, SessionOptions, get_device from onnxruntime.capi._pybind_state import ( # pylint: disable=E0611 SessionIOBinding, OrtDevice as C_OrtDevice, OrtValue as C_OrtValue) from sklearn.neighbors import RadiusNeighborsRegressor from skl2onnx import to_onnx from tqdm import tqdm from mlprodict.testing.experimental_c_impl.experimental_c import code_optimisation from mlprodict.onnxrt.ops_whole.session import OnnxWholeSession .. GENERATED FROM PYTHON SOURCE LINES 40-41 Available optimisation on this machine. .. GENERATED FROM PYTHON SOURCE LINES 41-45 .. code-block:: default print(code_optimisation()) .. rst-class:: sphx-glr-script-out .. code-block:: none AVX-omp=8 .. GENERATED FROM PYTHON SOURCE LINES 46-48 Building the model ++++++++++++++++++ .. GENERATED FROM PYTHON SOURCE LINES 48-64 .. code-block:: default filename = "onnx_to_profile.onnx" if not os.path.exists(filename): print(f"Generate a graph for {filename!r}.") X = numpy.random.randn(1000, 10).astype(numpy.float64) y = X.sum(axis=1).reshape((-1, 1)) model = RadiusNeighborsRegressor() model.fit(X, y) onx = to_onnx(model, X, options={'optim': 'cdist'}, target_opset=17) with open(filename, "wb") as f: f.write(onx.SerializeToString()) .. rst-class:: sphx-glr-script-out .. code-block:: none Generate a graph for 'onnx_to_profile.onnx'. .. GENERATED FROM PYTHON SOURCE LINES 65-69 Functions +++++++++ We need to generate random inputs to test the graph. .. GENERATED FROM PYTHON SOURCE LINES 69-106 .. code-block:: default def random_input(typ, shape, batch): if typ == 'tensor(double)': dtype = numpy.float64 elif typ == 'tensor(float)': dtype = numpy.float32 else: raise NotImplementedError( f"Unable to guess dtype from {typ!r}.") if len(shape) <= 1: new_shape = shape elif shape[0] is None: new_shape = tuple([batch] + list(shape[1:])) else: new_shape = shape return numpy.random.randn(*new_shape).astype(dtype) def random_feed(sess, batch=10): """ Creates a dictionary of random inputs. :param batch: dimension to use as batch dimension if unknown :return: dictionary """ inputs = sess.get_inputs() res = OrderedDict() for inp in inputs: name = inp.name typ = inp.type shape = inp.shape res[name] = random_input(typ, shape, batch) return res .. GENERATED FROM PYTHON SOURCE LINES 107-112 Profiling +++++++++ Let's choose the device available on this machine. batch dimension is set to 10. .. GENERATED FROM PYTHON SOURCE LINES 112-126 .. code-block:: default batch = 10 if get_device().upper() == 'GPU': ort_device = C_OrtDevice( C_OrtDevice.cuda(), C_OrtDevice.default_memory(), 0) provider = 'CUDAExecutionProvider' else: ort_device = C_OrtDevice( C_OrtDevice.cpu(), C_OrtDevice.default_memory(), 0) provider = 'CPUExecutionProvider' print(f"provider = {provider!r}") .. rst-class:: sphx-glr-script-out .. code-block:: none provider = 'CPUExecutionProvider' .. GENERATED FROM PYTHON SOURCE LINES 127-128 We load the graph. .. GENERATED FROM PYTHON SOURCE LINES 128-132 .. code-block:: default with open(filename, 'rb') as f: onx = onnx.load(f) .. GENERATED FROM PYTHON SOURCE LINES 133-134 Create of the session. .. GENERATED FROM PYTHON SOURCE LINES 134-144 .. code-block:: default so = SessionOptions() so.enable_profiling = True so.optimized_model_filepath = os.path.split(filename)[-1] + ".optimized.onnx" sess = InferenceSession(onx.SerializeToString(), so, providers=[provider]) bind = SessionIOBinding(sess._sess) print("graph_optimization_level:", so.graph_optimization_level) .. rst-class:: sphx-glr-script-out .. code-block:: none graph_optimization_level: GraphOptimizationLevel.ORT_ENABLE_ALL .. GENERATED FROM PYTHON SOURCE LINES 145-146 Creates random data .. GENERATED FROM PYTHON SOURCE LINES 146-148 .. code-block:: default feed = random_feed(sess, batch) .. GENERATED FROM PYTHON SOURCE LINES 149-150 moving the data on CPU or GPU .. GENERATED FROM PYTHON SOURCE LINES 150-155 .. code-block:: default feed_ort_value = OrderedDict( (name, (C_OrtValue.ortvalue_from_numpy(v, ort_device), v.dtype)) for name, v in feed.items()) outputs = [o.name for o in sess.get_outputs()] .. GENERATED FROM PYTHON SOURCE LINES 156-157 A function which calls the API for any device. .. GENERATED FROM PYTHON SOURCE LINES 157-169 .. code-block:: default def run_with_iobinding(sess, bind, ort_device, feed_ort_value, outputs): for name, (value, dtype) in feed_ort_value.items(): bind.bind_input(name, ort_device, dtype, value.shape(), value.data_ptr()) for out in outputs: bind.bind_output(out, ort_device) sess._sess.run_with_iobinding(bind, None) ortvalues = bind.get_outputs() return [o.numpy() for o in ortvalues] .. GENERATED FROM PYTHON SOURCE LINES 170-171 The profiling. .. GENERATED FROM PYTHON SOURCE LINES 171-182 .. code-block:: default for i in tqdm(range(0, 10)): run_with_iobinding(sess, bind, ort_device, feed_ort_value, outputs) prof = sess.end_profiling() with open(prof, "r") as f: js = json.load(f) df = pandas.DataFrame(OnnxWholeSession.process_profiling(js)) df .. rst-class:: sphx-glr-script-out .. code-block:: none 0%| | 0/10 [00:00
cat pid tid dur ts ph name args_op_name args_thread_scheduling_stats args_input_type_shape args_activation_size args_parameter_size args_graph_index args_output_size args_provider args_output_type_shape args_exec_plan_index
0 Session 32082 32082 875 5 X model_loading_array NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 Session 32082 32082 11960 967 X session_initialization NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 Node 32082 32082 1 20758 X cond_CDist_fence_before CDist NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 Node 32082 32082 398 20768 X cond_CDist_kernel_time CDist {'main_thread': {'thread_pool_name': 'session-... [{'double': [10, 10]}, {'double': [1000, 10]}] 800 80000 0 80000 CPUExecutionProvider [{'double': [10, 1000]}] 0
4 Node 32082 32082 0 21181 X cond_CDist_fence_after CDist NaN NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
617 Node 32082 32082 0 53397 X Re_Reshape_fence_before Reshape NaN NaN NaN NaN NaN NaN NaN NaN NaN
618 Node 32082 32082 39 53399 X Re_Reshape_kernel_time Reshape {'main_thread': {'thread_pool_name': 'session-... [{'double': [10]}, {'int64': [2]}] 80 16 20 80 CPUExecutionProvider [{'double': [10, 1]}] 20
619 Node 32082 32082 0 53449 X Re_Reshape_fence_after Reshape NaN NaN NaN NaN NaN NaN NaN NaN NaN
620 Session 32082 32082 2813 50641 X SequentialExecutor::Execute NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
621 Session 32082 32082 2835 50629 X model_run NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

622 rows × 17 columns



.. GENERATED FROM PYTHON SOURCE LINES 183-184 First graph is by operator type. .. GENERATED FROM PYTHON SOURCE LINES 184-200 .. code-block:: default gr_dur = df[['dur', "args_op_name"]].groupby( "args_op_name").sum().sort_values('dur') total = gr_dur['dur'].sum() gr_dur /= total gr_n = df[['dur', "args_op_name"]].groupby( "args_op_name").count().sort_values('dur') gr_n = gr_n.loc[gr_dur.index, :] fig, ax = plt.subplots(1, 2, figsize=(8, 4)) gr_dur.plot.barh(ax=ax[0]) gr_n.plot.barh(ax=ax[1]) ax[0].set_title("duration") ax[1].set_title("n occurences") fig.suptitle(os.path.split(filename)[-1]) .. image-sg:: /gyexamples/images/sphx_glr_plot_profile_ort_onnx_001.png :alt: onnx_to_profile.onnx, duration, n occurences :srcset: /gyexamples/images/sphx_glr_plot_profile_ort_onnx_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none Text(0.5, 0.98, 'onnx_to_profile.onnx') .. GENERATED FROM PYTHON SOURCE LINES 201-202 Second graph is by operator name. .. GENERATED FROM PYTHON SOURCE LINES 202-212 .. code-block:: default gr_dur = df[['dur', "args_op_name", "name"]].groupby( ["args_op_name", "name"]).sum().sort_values('dur') total = gr_dur['dur'].sum() gr_dur /= total if gr_dur.shape[0] > 30: gr_dur = gr_dur.tail(n=30) gr_dur.head(n=5) .. raw:: html
dur
args_op_name name
Cast nnbin_Cast_fence_after 0.0
nnbin_Cast_fence_before 0.0
Reshape normr_Reshape_fence_before 0.0
ConstantOfShape arange_ConstantOfShape_fence_after 0.0
Reshape normr_Reshape_fence_after 0.0


.. GENERATED FROM PYTHON SOURCE LINES 213-214 And the graph. .. GENERATED FROM PYTHON SOURCE LINES 214-222 .. code-block:: default _, ax = plt.subplots(1, 1, figsize=(8, gr_dur.shape[0] // 2)) gr_dur.plot.barh(ax=ax) ax.set_title("duration per node") for label in (ax.get_xticklabels() + ax.get_yticklabels()): label.set_fontsize(7) make_axes_area_auto_adjustable(ax) .. image-sg:: /gyexamples/images/sphx_glr_plot_profile_ort_onnx_002.png :alt: duration per node :srcset: /gyexamples/images/sphx_glr_plot_profile_ort_onnx_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 223-224 Cumsum is where the execution spends most of its time. .. GENERATED FROM PYTHON SOURCE LINES 224-226 .. code-block:: default # plt.show() .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 3.196 seconds) .. _sphx_glr_download_gyexamples_plot_profile_ort_onnx.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_profile_ort_onnx.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_profile_ort_onnx.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_