.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "gyexamples/plot_profile_ort_onnx.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_gyexamples_plot_profile_ort_onnx.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_gyexamples_plot_profile_ort_onnx.py:


.. _l-profile-ort-onnx:

Profiling of ONNX graph with onnxruntime
========================================

This example shows to profile the execution of an ONNX file
with :epkg:`onnxruntime` to find the operators which consume
most of the time. The script assumes the first dimension, if left
unknown, is the batch dimension.

.. contents::
    :local:

One ONNX file
+++++++++++++

This section creates an ONNX graph if there is not one.

.. GENERATED FROM PYTHON SOURCE LINES 21-39

.. code-block:: default

    import os
    import json
    from collections import OrderedDict
    import numpy
    import onnx
    import matplotlib.pyplot as plt
    from mpl_toolkits.axes_grid1.axes_divider import make_axes_area_auto_adjustable
    import pandas
    from onnxruntime import InferenceSession, SessionOptions, get_device
    from onnxruntime.capi._pybind_state import (  # pylint: disable=E0611
        SessionIOBinding, OrtDevice as C_OrtDevice, OrtValue as C_OrtValue)
    from sklearn.neighbors import RadiusNeighborsRegressor
    from skl2onnx import to_onnx
    from tqdm import tqdm
    from mlprodict.testing.experimental_c_impl.experimental_c import code_optimisation
    from mlprodict.onnxrt.ops_whole.session import OnnxWholeSession


.. GENERATED FROM PYTHON SOURCE LINES 40-41

Available optimisation on this machine.

.. GENERATED FROM PYTHON SOURCE LINES 41-45

.. code-block:: default


    print(code_optimisation())


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    AVX-omp=8


.. GENERATED FROM PYTHON SOURCE LINES 46-48

Building the model
++++++++++++++++++

.. GENERATED FROM PYTHON SOURCE LINES 48-64

.. code-block:: default


    filename = "onnx_to_profile.onnx"


    if not os.path.exists(filename):
        print(f"Generate a graph for {filename!r}.")
        X = numpy.random.randn(1000, 10).astype(numpy.float64)
        y = X.sum(axis=1).reshape((-1, 1))

        model = RadiusNeighborsRegressor()
        model.fit(X, y)
        onx = to_onnx(model, X, options={'optim': 'cdist'}, target_opset=17)

        with open(filename, "wb") as f:
            f.write(onx.SerializeToString())


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Generate a graph for 'onnx_to_profile.onnx'.


.. GENERATED FROM PYTHON SOURCE LINES 65-69

Functions
+++++++++

We need to generate random inputs to test the graph.

.. GENERATED FROM PYTHON SOURCE LINES 69-106

.. code-block:: default


    def random_input(typ, shape, batch):
        if typ == 'tensor(double)':
            dtype = numpy.float64
        elif typ == 'tensor(float)':
            dtype = numpy.float32
        else:
            raise NotImplementedError(
                f"Unable to guess dtype from {typ!r}.")

        if len(shape) <= 1:
            new_shape = shape
        elif shape[0] is None:
            new_shape = tuple([batch] + list(shape[1:]))
        else:
            new_shape = shape
        return numpy.random.randn(*new_shape).astype(dtype)


    def random_feed(sess, batch=10):
        """
        Creates a dictionary of random inputs.

        :param batch: dimension to use as batch dimension if unknown
        :return: dictionary
        """
        inputs = sess.get_inputs()
        res = OrderedDict()
        for inp in inputs:
            name = inp.name
            typ = inp.type
            shape = inp.shape
            res[name] = random_input(typ, shape, batch)
        return res


.. GENERATED FROM PYTHON SOURCE LINES 107-112

Profiling
+++++++++

Let's choose the device available on this machine.
batch dimension is set to 10.

.. GENERATED FROM PYTHON SOURCE LINES 112-126

.. code-block:: default


    batch = 10

    if get_device().upper() == 'GPU':
        ort_device = C_OrtDevice(
            C_OrtDevice.cuda(), C_OrtDevice.default_memory(), 0)
        provider = 'CUDAExecutionProvider'
    else:
        ort_device = C_OrtDevice(
            C_OrtDevice.cpu(), C_OrtDevice.default_memory(), 0)
        provider = 'CPUExecutionProvider'

    print(f"provider = {provider!r}")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    provider = 'CPUExecutionProvider'


.. GENERATED FROM PYTHON SOURCE LINES 127-128

We load the graph.

.. GENERATED FROM PYTHON SOURCE LINES 128-132

.. code-block:: default


    with open(filename, 'rb') as f:
        onx = onnx.load(f)


.. GENERATED FROM PYTHON SOURCE LINES 133-134

Create of the session.

.. GENERATED FROM PYTHON SOURCE LINES 134-144

.. code-block:: default


    so = SessionOptions()
    so.enable_profiling = True
    so.optimized_model_filepath = os.path.split(filename)[-1] + ".optimized.onnx"
    sess = InferenceSession(onx.SerializeToString(), so,
                            providers=[provider])
    bind = SessionIOBinding(sess._sess)

    print("graph_optimization_level:", so.graph_optimization_level)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    graph_optimization_level: GraphOptimizationLevel.ORT_ENABLE_ALL


.. GENERATED FROM PYTHON SOURCE LINES 145-146

Creates random data

.. GENERATED FROM PYTHON SOURCE LINES 146-148

.. code-block:: default

    feed = random_feed(sess, batch)


.. GENERATED FROM PYTHON SOURCE LINES 149-150

moving the data on CPU or GPU

.. GENERATED FROM PYTHON SOURCE LINES 150-155

.. code-block:: default

    feed_ort_value = OrderedDict(
        (name, (C_OrtValue.ortvalue_from_numpy(v, ort_device), v.dtype))
        for name, v in feed.items())
    outputs = [o.name for o in sess.get_outputs()]


.. GENERATED FROM PYTHON SOURCE LINES 156-157

A function which calls the API for any device.

.. GENERATED FROM PYTHON SOURCE LINES 157-169

.. code-block:: default


    def run_with_iobinding(sess, bind, ort_device, feed_ort_value, outputs):
        for name, (value, dtype) in feed_ort_value.items():
            bind.bind_input(name, ort_device, dtype, value.shape(),
                            value.data_ptr())
        for out in outputs:
            bind.bind_output(out, ort_device)
        sess._sess.run_with_iobinding(bind, None)
        ortvalues = bind.get_outputs()
        return [o.numpy() for o in ortvalues]


.. GENERATED FROM PYTHON SOURCE LINES 170-171

The profiling.

.. GENERATED FROM PYTHON SOURCE LINES 171-182

.. code-block:: default


    for i in tqdm(range(0, 10)):
        run_with_iobinding(sess, bind, ort_device, feed_ort_value, outputs)

    prof = sess.end_profiling()
    with open(prof, "r") as f:
        js = json.load(f)
    df = pandas.DataFrame(OnnxWholeSession.process_profiling(js))
    df


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


      0%|          | 0/10 [00:00<?, ?it/s]
    100%|##########| 10/10 [00:00<00:00, 300.04it/s]


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>cat</th>
          <th>pid</th>
          <th>tid</th>
          <th>dur</th>
          <th>ts</th>
          <th>ph</th>
          <th>name</th>
          <th>args_op_name</th>
          <th>args_thread_scheduling_stats</th>
          <th>args_input_type_shape</th>
          <th>args_activation_size</th>
          <th>args_parameter_size</th>
          <th>args_graph_index</th>
          <th>args_output_size</th>
          <th>args_provider</th>
          <th>args_output_type_shape</th>
          <th>args_exec_plan_index</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>Session</td>
          <td>32082</td>
          <td>32082</td>
          <td>875</td>
          <td>5</td>
          <td>X</td>
          <td>model_loading_array</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
        </tr>
        <tr>
          <th>1</th>
          <td>Session</td>
          <td>32082</td>
          <td>32082</td>
          <td>11960</td>
          <td>967</td>
          <td>X</td>
          <td>session_initialization</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
        </tr>
        <tr>
          <th>2</th>
          <td>Node</td>
          <td>32082</td>
          <td>32082</td>
          <td>1</td>
          <td>20758</td>
          <td>X</td>
          <td>cond_CDist_fence_before</td>
          <td>CDist</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
        </tr>
        <tr>
          <th>3</th>
          <td>Node</td>
          <td>32082</td>
          <td>32082</td>
          <td>398</td>
          <td>20768</td>
          <td>X</td>
          <td>cond_CDist_kernel_time</td>
          <td>CDist</td>
          <td>{'main_thread': {'thread_pool_name': 'session-...</td>
          <td>[{'double': [10, 10]}, {'double': [1000, 10]}]</td>
          <td>800</td>
          <td>80000</td>
          <td>0</td>
          <td>80000</td>
          <td>CPUExecutionProvider</td>
          <td>[{'double': [10, 1000]}]</td>
          <td>0</td>
        </tr>
        <tr>
          <th>4</th>
          <td>Node</td>
          <td>32082</td>
          <td>32082</td>
          <td>0</td>
          <td>21181</td>
          <td>X</td>
          <td>cond_CDist_fence_after</td>
          <td>CDist</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
        </tr>
        <tr>
          <th>...</th>
          <td>...</td>
          <td>...</td>
          <td>...</td>
          <td>...</td>
          <td>...</td>
          <td>...</td>
          <td>...</td>
          <td>...</td>
          <td>...</td>
          <td>...</td>
          <td>...</td>
          <td>...</td>
          <td>...</td>
          <td>...</td>
          <td>...</td>
          <td>...</td>
          <td>...</td>
        </tr>
        <tr>
          <th>617</th>
          <td>Node</td>
          <td>32082</td>
          <td>32082</td>
          <td>0</td>
          <td>53397</td>
          <td>X</td>
          <td>Re_Reshape_fence_before</td>
          <td>Reshape</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
        </tr>
        <tr>
          <th>618</th>
          <td>Node</td>
          <td>32082</td>
          <td>32082</td>
          <td>39</td>
          <td>53399</td>
          <td>X</td>
          <td>Re_Reshape_kernel_time</td>
          <td>Reshape</td>
          <td>{'main_thread': {'thread_pool_name': 'session-...</td>
          <td>[{'double': [10]}, {'int64': [2]}]</td>
          <td>80</td>
          <td>16</td>
          <td>20</td>
          <td>80</td>
          <td>CPUExecutionProvider</td>
          <td>[{'double': [10, 1]}]</td>
          <td>20</td>
        </tr>
        <tr>
          <th>619</th>
          <td>Node</td>
          <td>32082</td>
          <td>32082</td>
          <td>0</td>
          <td>53449</td>
          <td>X</td>
          <td>Re_Reshape_fence_after</td>
          <td>Reshape</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
        </tr>
        <tr>
          <th>620</th>
          <td>Session</td>
          <td>32082</td>
          <td>32082</td>
          <td>2813</td>
          <td>50641</td>
          <td>X</td>
          <td>SequentialExecutor::Execute</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
        </tr>
        <tr>
          <th>621</th>
          <td>Session</td>
          <td>32082</td>
          <td>32082</td>
          <td>2835</td>
          <td>50629</td>
          <td>X</td>
          <td>model_run</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
          <td>NaN</td>
        </tr>
      </tbody>
    </table>
    <p>622 rows × 17 columns</p>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 183-184

First graph is by operator type.

.. GENERATED FROM PYTHON SOURCE LINES 184-200

.. code-block:: default


    gr_dur = df[['dur', "args_op_name"]].groupby(
        "args_op_name").sum().sort_values('dur')
    total = gr_dur['dur'].sum()
    gr_dur /= total
    gr_n = df[['dur', "args_op_name"]].groupby(
        "args_op_name").count().sort_values('dur')
    gr_n = gr_n.loc[gr_dur.index, :]

    fig, ax = plt.subplots(1, 2, figsize=(8, 4))
    gr_dur.plot.barh(ax=ax[0])
    gr_n.plot.barh(ax=ax[1])
    ax[0].set_title("duration")
    ax[1].set_title("n occurences")
    fig.suptitle(os.path.split(filename)[-1])


.. image-sg:: /gyexamples/images/sphx_glr_plot_profile_ort_onnx_001.png
   :alt: onnx_to_profile.onnx, duration, n occurences
   :srcset: /gyexamples/images/sphx_glr_plot_profile_ort_onnx_001.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Text(0.5, 0.98, 'onnx_to_profile.onnx')


.. GENERATED FROM PYTHON SOURCE LINES 201-202

Second graph is by operator name.

.. GENERATED FROM PYTHON SOURCE LINES 202-212

.. code-block:: default


    gr_dur = df[['dur', "args_op_name", "name"]].groupby(
        ["args_op_name", "name"]).sum().sort_values('dur')
    total = gr_dur['dur'].sum()
    gr_dur /= total
    if gr_dur.shape[0] > 30:
        gr_dur = gr_dur.tail(n=30)

    gr_dur.head(n=5)


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th></th>
          <th>dur</th>
        </tr>
        <tr>
          <th>args_op_name</th>
          <th>name</th>
          <th></th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th rowspan="2" valign="top">Cast</th>
          <th>nnbin_Cast_fence_after</th>
          <td>0.0</td>
        </tr>
        <tr>
          <th>nnbin_Cast_fence_before</th>
          <td>0.0</td>
        </tr>
        <tr>
          <th>Reshape</th>
          <th>normr_Reshape_fence_before</th>
          <td>0.0</td>
        </tr>
        <tr>
          <th>ConstantOfShape</th>
          <th>arange_ConstantOfShape_fence_after</th>
          <td>0.0</td>
        </tr>
        <tr>
          <th>Reshape</th>
          <th>normr_Reshape_fence_after</th>
          <td>0.0</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 213-214

And the graph.

.. GENERATED FROM PYTHON SOURCE LINES 214-222

.. code-block:: default


    _, ax = plt.subplots(1, 1, figsize=(8, gr_dur.shape[0] // 2))
    gr_dur.plot.barh(ax=ax)
    ax.set_title("duration per node")
    for label in (ax.get_xticklabels() + ax.get_yticklabels()):
        label.set_fontsize(7)
    make_axes_area_auto_adjustable(ax)


.. image-sg:: /gyexamples/images/sphx_glr_plot_profile_ort_onnx_002.png
   :alt: duration per node
   :srcset: /gyexamples/images/sphx_glr_plot_profile_ort_onnx_002.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 223-224

Cumsum is where the execution spends most of its time.

.. GENERATED FROM PYTHON SOURCE LINES 224-226

.. code-block:: default


    # plt.show()


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  3.196 seconds)


.. _sphx_glr_download_gyexamples_plot_profile_ort_onnx.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example


    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_profile_ort_onnx.py <plot_profile_ort_onnx.py>`

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_profile_ort_onnx.ipynb <plot_profile_ort_onnx.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_