.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "gyexamples/plot_benchmark_graph_opt.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_gyexamples_plot_benchmark_graph_opt.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_gyexamples_plot_benchmark_graph_opt.py:


.. _benchmark-ort-onnx-graph-opt:

Benchmark onnxruntime optimization
==================================

:epkg:`onnxruntime` does optimize the ONNX graph before
running the inference. It tries for example to fuse a matrix multiplication
following or followed by a transpose, choosing the most efficient path.

.. contents::
    :local:

One ONNX file
+++++++++++++

This section creates an ONNX graph if there is not one.

.. GENERATED FROM PYTHON SOURCE LINES 20-36

.. code-block:: default

    import os
    from collections import OrderedDict, Counter
    import numpy
    import onnx
    from cpyquickhelper.numbers.speed_measure import measure_time
    import pandas
    from onnxruntime import InferenceSession, SessionOptions, get_device
    from onnxruntime.capi._pybind_state import (  # pylint: disable=E0611
        SessionIOBinding, OrtDevice as C_OrtDevice, OrtValue as C_OrtValue,
        GraphOptimizationLevel)
    from sklearn.neighbors import RadiusNeighborsRegressor
    from skl2onnx import to_onnx
    from tqdm import tqdm
    from mlprodict.testing.experimental_c_impl.experimental_c import code_optimisation


.. GENERATED FROM PYTHON SOURCE LINES 37-38

Available optimisation on this machine.

.. GENERATED FROM PYTHON SOURCE LINES 38-42

.. code-block:: default


    print(code_optimisation())


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    AVX-omp=8


.. GENERATED FROM PYTHON SOURCE LINES 43-45

Building the model
++++++++++++++++++

.. GENERATED FROM PYTHON SOURCE LINES 45-61

.. code-block:: default


    filename = "onnx_to_profile.onnx"

    if not os.path.exists(filename):
        print(f"Generate a graph for {filename!r}.")
        X = numpy.random.randn(1000, 10).astype(numpy.float64)
        y = X.sum(axis=1).reshape((-1, 1))

        model = RadiusNeighborsRegressor()
        model.fit(X, y)
        onx = to_onnx(model, X, options={'optim': 'cdist'},
                      target_opset=17)

        with open(filename, "wb") as f:
            f.write(onx.SerializeToString())


.. GENERATED FROM PYTHON SOURCE LINES 62-66

Functions
+++++++++

We need to generate random inputs to test the graph.

.. GENERATED FROM PYTHON SOURCE LINES 66-103

.. code-block:: default


    def random_input(typ, shape, batch):
        if typ == 'tensor(double)':
            dtype = numpy.float64
        elif typ == 'tensor(float)':
            dtype = numpy.float32
        else:
            raise NotImplementedError(
                f"Unable to guess dtype from {typ!r}.")

        if len(shape) <= 1:
            new_shape = shape
        elif shape[0] is None:
            new_shape = tuple([batch] + list(shape[1:]))
        else:
            new_shape = shape
        return numpy.random.randn(*new_shape).astype(dtype)


    def random_feed(sess, batch=10):
        """
        Creates a dictionary of random inputs.

        :param batch: dimension to use as batch dimension if unknown
        :return: dictionary
        """
        inputs = sess.get_inputs()
        res = OrderedDict()
        for inp in inputs:
            name = inp.name
            typ = inp.type
            shape = inp.shape
            res[name] = random_input(typ, shape, batch)
        return res


.. GENERATED FROM PYTHON SOURCE LINES 104-105

A function which calls the API for any device.

.. GENERATED FROM PYTHON SOURCE LINES 105-118

.. code-block:: default


    def run_with_iobinding(sess, bind, ort_device, feed_ort_value, outputs):
        for name, (value, dtype) in feed_ort_value.items():
            bind.bind_input(name, ort_device, dtype, value.shape(),
                            value.data_ptr())
        for out in outputs:
            bind.bind_output(out, ort_device)
        sess._sess.run_with_iobinding(bind, None)
        ortvalues = bind.get_outputs()
        return [o.numpy() for o in ortvalues]


.. GENERATED FROM PYTHON SOURCE LINES 119-124

Benchmark
+++++++++

Let's choose the device available on this machine.
batch dimension is set to 10.

.. GENERATED FROM PYTHON SOURCE LINES 124-137

.. code-block:: default


    batch = 200

    if get_device().upper() == 'GPU':
        ort_device = C_OrtDevice(
            C_OrtDevice.cuda(), C_OrtDevice.default_memory(), 0)
        provider = 'CUDAExecutionProvider'
    else:
        ort_device = C_OrtDevice(
            C_OrtDevice.cpu(), C_OrtDevice.default_memory(), 0)
        provider = 'CPUExecutionProvider'
    print(f"provider = {provider!r}")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    provider = 'CPUExecutionProvider'


.. GENERATED FROM PYTHON SOURCE LINES 138-139

We load the graph.

.. GENERATED FROM PYTHON SOURCE LINES 139-143

.. code-block:: default


    with open(filename, 'rb') as f:
        onx = onnx.load(f)


.. GENERATED FROM PYTHON SOURCE LINES 144-145

Create of the session.

.. GENERATED FROM PYTHON SOURCE LINES 145-194

.. code-block:: default

    data = []
    files = []
    legend = []

    for graph_opt, name_opt in tqdm([
            (GraphOptimizationLevel.ORT_DISABLE_ALL, "ORT_DISABLE_ALL"),
            (GraphOptimizationLevel.ORT_ENABLE_BASIC, "ORT_ENABLE_BASIC"),
            (GraphOptimizationLevel.ORT_ENABLE_EXTENDED, "ORT_ENABLE_EXTENDED"),
            (GraphOptimizationLevel.ORT_ENABLE_ALL, "ORT_ENABLE_ALL")]):

        so = SessionOptions()
        so.graph_optimization_level = graph_opt
        so.optimized_model_filepath = (
            os.path.split(filename)[-1] + f".optimized.{name_opt}.onnx")
        files.append(so.optimized_model_filepath)
        legend.append(name_opt)
        sess = InferenceSession(onx.SerializeToString(), so,
                                providers=[provider])
        bind = SessionIOBinding(sess._sess)

        #####################################
        # Creates random data
        feed = random_feed(sess, batch)

        #####################################
        # moving the data on CPU or GPU
        feed_ort_value = OrderedDict(
            (name, (C_OrtValue.ortvalue_from_numpy(v, ort_device), v.dtype))
            for name, v in feed.items())
        outputs = [o.name for o in sess.get_outputs()]

        #######################################
        # The profiling.

        obs = measure_time(
            lambda: run_with_iobinding(
                sess, bind, ort_device, feed_ort_value, outputs),
            context=dict(run_with_iobinding=run_with_iobinding,
                         feed_ort_value=feed_ort_value, outputs=outputs,
                         sess=sess, bind=bind, ort_device=ort_device),
            repeat=10, number=10, div_by_number=True)
        obs['name'] = name_opt
        data.append(obs)


    df = pandas.DataFrame(data)
    df


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


      0%|          | 0/4 [00:00<?, ?it/s]
     25%|##5       | 1/4 [00:02<00:06,  2.25s/it]
     50%|#####     | 2/4 [00:04<00:04,  2.21s/it]
     75%|#######5  | 3/4 [00:06<00:02,  2.21s/it]
    100%|##########| 4/4 [00:08<00:00,  2.21s/it]
    100%|##########| 4/4 [00:08<00:00,  2.21s/it]


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>average</th>
          <th>deviation</th>
          <th>min_exec</th>
          <th>max_exec</th>
          <th>repeat</th>
          <th>number</th>
          <th>ttime</th>
          <th>context_size</th>
          <th>name</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>0</th>
          <td>0.022404</td>
          <td>0.000044</td>
          <td>0.022363</td>
          <td>0.022495</td>
          <td>10</td>
          <td>10</td>
          <td>0.224037</td>
          <td>360</td>
          <td>ORT_DISABLE_ALL</td>
        </tr>
        <tr>
          <th>1</th>
          <td>0.021755</td>
          <td>0.000049</td>
          <td>0.021699</td>
          <td>0.021852</td>
          <td>10</td>
          <td>10</td>
          <td>0.217548</td>
          <td>360</td>
          <td>ORT_ENABLE_BASIC</td>
        </tr>
        <tr>
          <th>2</th>
          <td>0.021758</td>
          <td>0.000038</td>
          <td>0.021700</td>
          <td>0.021829</td>
          <td>10</td>
          <td>10</td>
          <td>0.217584</td>
          <td>360</td>
          <td>ORT_ENABLE_EXTENDED</td>
        </tr>
        <tr>
          <th>3</th>
          <td>0.021735</td>
          <td>0.000020</td>
          <td>0.021700</td>
          <td>0.021767</td>
          <td>10</td>
          <td>10</td>
          <td>0.217352</td>
          <td>360</td>
          <td>ORT_ENABLE_ALL</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 195-197

Graph
+++++

.. GENERATED FROM PYTHON SOURCE LINES 197-205

.. code-block:: default


    df = df.set_index('name')
    dev = df[['deviation']].copy()
    dev.columns = ['average']
    ax = df[['average']].plot.bar(yerr=dev)
    ax.set_title(os.path.split(filename)[-1])
    ax.tick_params(axis='x', labelrotation=15)


.. image-sg:: /gyexamples/images/sphx_glr_plot_benchmark_graph_opt_001.png
   :alt: onnx_to_profile.onnx
   :srcset: /gyexamples/images/sphx_glr_plot_benchmark_graph_opt_001.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 206-207

The result are similar because the optimized model was very similar.

.. GENERATED FROM PYTHON SOURCE LINES 207-219

.. code-block:: default


    data = []
    for name in files:
        with open(name, "rb") as f:
            onx = onnx.load(f)
        op_names = [op.op_type for op in onx.graph.node]
        data.append(Counter(op_names))

    df = pandas.DataFrame(data).T
    df.columns = legend
    df


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th>ORT_DISABLE_ALL</th>
          <th>ORT_ENABLE_BASIC</th>
          <th>ORT_ENABLE_EXTENDED</th>
          <th>ORT_ENABLE_ALL</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>CDist</th>
          <td>1</td>
          <td>1</td>
          <td>1</td>
          <td>1</td>
        </tr>
        <tr>
          <th>Less</th>
          <td>1</td>
          <td>1</td>
          <td>1</td>
          <td>1</td>
        </tr>
        <tr>
          <th>Shape</th>
          <td>2</td>
          <td>2</td>
          <td>2</td>
          <td>2</td>
        </tr>
        <tr>
          <th>ConstantOfShape</th>
          <td>1</td>
          <td>1</td>
          <td>1</td>
          <td>1</td>
        </tr>
        <tr>
          <th>Cast</th>
          <td>3</td>
          <td>2</td>
          <td>2</td>
          <td>2</td>
        </tr>
        <tr>
          <th>ReduceSum</th>
          <td>2</td>
          <td>2</td>
          <td>2</td>
          <td>2</td>
        </tr>
        <tr>
          <th>CumSum</th>
          <td>1</td>
          <td>1</td>
          <td>1</td>
          <td>1</td>
        </tr>
        <tr>
          <th>Neg</th>
          <td>1</td>
          <td>1</td>
          <td>1</td>
          <td>1</td>
        </tr>
        <tr>
          <th>Add</th>
          <td>1</td>
          <td>1</td>
          <td>1</td>
          <td>1</td>
        </tr>
        <tr>
          <th>Where</th>
          <td>1</td>
          <td>1</td>
          <td>1</td>
          <td>1</td>
        </tr>
        <tr>
          <th>Flatten</th>
          <td>1</td>
          <td>1</td>
          <td>1</td>
          <td>1</td>
        </tr>
        <tr>
          <th>ArrayFeatureExtractor</th>
          <td>1</td>
          <td>1</td>
          <td>1</td>
          <td>1</td>
        </tr>
        <tr>
          <th>Reshape</th>
          <td>3</td>
          <td>3</td>
          <td>3</td>
          <td>3</td>
        </tr>
        <tr>
          <th>Mul</th>
          <td>1</td>
          <td>1</td>
          <td>1</td>
          <td>1</td>
        </tr>
        <tr>
          <th>Div</th>
          <td>1</td>
          <td>1</td>
          <td>1</td>
          <td>1</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 220-221

Graph.

.. GENERATED FROM PYTHON SOURCE LINES 221-227

.. code-block:: default


    ax = df.plot.barh(yerr=dev)
    ax.set_title(os.path.split(filename)[-1])

    # import matplotlib.pyplot as plt
    # plt.show()


.. image-sg:: /gyexamples/images/sphx_glr_plot_benchmark_graph_opt_002.png
   :alt: onnx_to_profile.onnx
   :srcset: /gyexamples/images/sphx_glr_plot_benchmark_graph_opt_002.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Text(0.5, 1.0, 'onnx_to_profile.onnx')


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  9.926 seconds)


.. _sphx_glr_download_gyexamples_plot_benchmark_graph_opt.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example


    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_benchmark_graph_opt.py <plot_benchmark_graph_opt.py>`

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_benchmark_graph_opt.ipynb <plot_benchmark_graph_opt.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_