.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "gyexamples/plot_dbegin_options_zipmap.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_gyexamples_plot_dbegin_options_zipmap.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_gyexamples_plot_dbegin_options_zipmap.py:


.. _l-tutorial-example-zipmap:

Choose appropriate output of a classifier
=========================================

A scikit-learn classifier usually returns a matrix of probabilities.
By default, *sklearn-onnx* converts that matrix
into a list of dictionaries where each probabily is mapped
to its class id or name. That mechanism retains the class names
but is slower. Let's see what other options are available.

.. contents::
    :local:

Train a model and convert it
++++++++++++++++++++++++++++

.. GENERATED FROM PYTHON SOURCE LINES 20-44

.. code-block:: default

    from timeit import repeat
    import numpy
    import sklearn
    from sklearn.datasets import load_iris
    from sklearn.model_selection import train_test_split
    import onnxruntime as rt
    import onnx
    import skl2onnx
    from skl2onnx.common.data_types import FloatTensorType
    from skl2onnx import to_onnx
    from sklearn.linear_model import LogisticRegression
    from sklearn.multioutput import MultiOutputClassifier

    iris = load_iris()
    X, y = iris.data, iris.target
    X = X.astype(numpy.float32)
    y = y * 2 + 10  # to get labels different from [0, 1, 2]
    X_train, X_test, y_train, y_test = train_test_split(X, y)
    clr = LogisticRegression(max_iter=500)
    clr.fit(X_train, y_train)
    print(clr)

    onx = to_onnx(clr, X_train, target_opset={'': 14, 'ai.onnx.ml': 2})


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    LogisticRegression(max_iter=500)


.. GENERATED FROM PYTHON SOURCE LINES 45-50

Default behaviour: zipmap=True
++++++++++++++++++++++++++++++

The output type for the probabilities is a list of
dictionaries.

.. GENERATED FROM PYTHON SOURCE LINES 50-58

.. code-block:: default


    sess = rt.InferenceSession(onx.SerializeToString(),
                               providers=['CPUExecutionProvider'])
    res = sess.run(None, {'X': X_test})
    print(res[1][:2])
    print("probabilities type:", type(res[1]))
    print("type for the first observations:", type(res[1][0]))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    [{10: 1.1451697901065927e-05, 12: 0.020342448726296425, 14: 0.9796461462974548}, {10: 0.01027061976492405, 12: 0.7854514718055725, 14: 0.20427796244621277}]
    probabilities type: <class 'list'>
    type for the first observations: <class 'dict'>


.. GENERATED FROM PYTHON SOURCE LINES 59-63

Option zipmap=False
+++++++++++++++++++

Probabilities are now a matrix.

.. GENERATED FROM PYTHON SOURCE LINES 63-76

.. code-block:: default


    initial_type = [('float_input', FloatTensorType([None, 4]))]
    options = {id(clr): {'zipmap': False}}
    onx2 = to_onnx(clr, X_train, options=options,
                   target_opset={'': 14, 'ai.onnx.ml': 2})

    sess2 = rt.InferenceSession(onx2.SerializeToString(),
                                providers=['CPUExecutionProvider'])
    res2 = sess2.run(None, {'X': X_test})
    print(res2[1][:2])
    print("probabilities type:", type(res2[1]))
    print("type for the first observations:", type(res2[1][0]))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    [[1.1451698e-05 2.0342449e-02 9.7964615e-01]
     [1.0270620e-02 7.8545147e-01 2.0427796e-01]]
    probabilities type: <class 'numpy.ndarray'>
    type for the first observations: <class 'numpy.ndarray'>


.. GENERATED FROM PYTHON SOURCE LINES 77-83

Option zipmap='columns'
+++++++++++++++++++++++

This options removes the final operator ZipMap and splits
the probabilities into columns. The final model produces
one output for the label, and one output per class.

.. GENERATED FROM PYTHON SOURCE LINES 83-96

.. code-block:: default


    options = {id(clr): {'zipmap': 'columns'}}
    onx3 = to_onnx(clr, X_train, options=options,
                   target_opset={'': 14, 'ai.onnx.ml': 2})

    sess3 = rt.InferenceSession(onx3.SerializeToString(),
                                providers=['CPUExecutionProvider'])
    res3 = sess3.run(None, {'X': X_test})
    for i, out in enumerate(sess3.get_outputs()):
        print(
            f"output: '{out.name}' shape={res3[i].shape} values={res3[i][:2]}...")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    output: 'output_label' shape=(38,) values=[14 12]...
    output: 'i10' shape=(38,) values=[1.1451698e-05 1.0270620e-02]...
    output: 'i12' shape=(38,) values=[0.02034245 0.7854515 ]...
    output: 'i14' shape=(38,) values=[0.97964615 0.20427796]...


.. GENERATED FROM PYTHON SOURCE LINES 97-99

Let's compare prediction time
+++++++++++++++++++++++++++++

.. GENERATED FROM PYTHON SOURCE LINES 99-119

.. code-block:: default


    print("Average time with ZipMap:")
    print(sum(repeat(lambda: sess.run(None, {'X': X_test}),
                     number=100, repeat=10)) / 10)

    print("Average time without ZipMap:")
    print(sum(repeat(lambda: sess2.run(None, {'X': X_test}),
                     number=100, repeat=10)) / 10)

    print("Average time without ZipMap but with columns:")
    print(sum(repeat(lambda: sess3.run(None, {'X': X_test}),
                     number=100, repeat=10)) / 10)

    # The prediction is much faster without ZipMap
    # on this example.
    # The optimisation is even faster when the classes
    # are described with strings and not integers
    # as the final result (list of dictionaries) may copy
    # many times the same information with onnxruntime.


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Average time with ZipMap:
    0.011111683095805347
    Average time without ZipMap:
    0.005920915305614472
    Average time without ZipMap but with columns:
    0.00971748341107741


.. GENERATED FROM PYTHON SOURCE LINES 120-127

Option zimpap=False and output_class_labels=True
++++++++++++++++++++++++++++++++++++++++++++++++

Option `zipmap=False` seems a better choice because it is
much faster but labels are lost in the process. Option
`output_class_labels` can be used to expose the labels
as a third output.

.. GENERATED FROM PYTHON SOURCE LINES 127-140

.. code-block:: default


    initial_type = [('float_input', FloatTensorType([None, 4]))]
    options = {id(clr): {'zipmap': False, 'output_class_labels': True}}
    onx4 = to_onnx(clr, X_train, options=options,
                   target_opset={'': 14, 'ai.onnx.ml': 2})

    sess4 = rt.InferenceSession(onx4.SerializeToString(),
                                providers=['CPUExecutionProvider'])
    res4 = sess4.run(None, {'X': X_test})
    print(res4[1][:2])
    print("probabilities type:", type(res4[1]))
    print("class labels:", res4[2])


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    [[1.1451698e-05 2.0342449e-02 9.7964615e-01]
     [1.0270620e-02 7.8545147e-01 2.0427796e-01]]
    probabilities type: <class 'numpy.ndarray'>
    class labels: [10 12 14]


.. GENERATED FROM PYTHON SOURCE LINES 141-142

Processing time.

.. GENERATED FROM PYTHON SOURCE LINES 142-147

.. code-block:: default


    print("Average time without ZipMap but with output_class_labels:")
    print(sum(repeat(lambda: sess4.run(None, {'X': X_test}),
                     number=100, repeat=10)) / 10)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Average time without ZipMap but with output_class_labels:
    0.006411535281222314


.. GENERATED FROM PYTHON SOURCE LINES 148-155

MultiOutputClassifier
+++++++++++++++++++++

This model is equivalent to several classifiers, one for every label
to predict. Instead of returning a matrix of probabilities, it returns
a sequence of matrices. Let's first modify the labels to get
a problem for a MultiOutputClassifier.

.. GENERATED FROM PYTHON SOURCE LINES 155-160

.. code-block:: default


    y = numpy.vstack([y, y + 100]).T
    y[::5, 1] = 1000  # Let's a fourth class.
    print(y[:5])


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    [[  10 1000]
     [  10  110]
     [  10  110]
     [  10  110]
     [  10  110]]


.. GENERATED FROM PYTHON SOURCE LINES 161-162

Let's train a MultiOutputClassifier.

.. GENERATED FROM PYTHON SOURCE LINES 162-175

.. code-block:: default


    X_train, X_test, y_train, y_test = train_test_split(X, y)
    clr = MultiOutputClassifier(LogisticRegression(max_iter=500))
    clr.fit(X_train, y_train)
    print(clr)

    onx5 = to_onnx(clr, X_train, target_opset={'': 14, 'ai.onnx.ml': 2})

    sess5 = rt.InferenceSession(onx5.SerializeToString(),
                                providers=['CPUExecutionProvider'])
    res5 = sess5.run(None, {'X': X_test[:3]})
    print(res5)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    MultiOutputClassifier(estimator=LogisticRegression(max_iter=500))
    somewhere/workspace/onnxcustom/onnxcustom_UT_39_std/_venv/lib/python3.9/site-packages/skl2onnx/_parse.py:529: UserWarning: Option zipmap is ignored for model <class 'sklearn.multioutput.MultiOutputClassifier'>. Set option zipmap to False to remove this message.
      warnings.warn(
    [array([[ 14, 114],
           [ 14, 114],
           [ 14, 114]], dtype=int64), [array([[4.3274203e-04, 1.8162604e-01, 8.1794125e-01],
           [7.1996351e-04, 3.5074779e-01, 6.4853227e-01],
           [2.2366168e-05, 4.4211719e-02, 9.5576590e-01]], dtype=float32), array([[1.4372485e-03, 2.3755349e-01, 6.5266651e-01, 1.0834280e-01],
           [1.7852996e-03, 3.8362366e-01, 4.7284558e-01, 1.4174546e-01],
           [2.8334750e-04, 1.5673934e-01, 7.3674095e-01, 1.0623635e-01]],
          dtype=float32)]]


.. GENERATED FROM PYTHON SOURCE LINES 176-178

Option zipmap is ignored. Labels are missing but they can be
added back as a third output.

.. GENERATED FROM PYTHON SOURCE LINES 178-190

.. code-block:: default


    onx6 = to_onnx(clr, X_train, target_opset={'': 14, 'ai.onnx.ml': 2},
                   options={'zipmap': False, 'output_class_labels': True})

    sess6 = rt.InferenceSession(onx6.SerializeToString(),
                                providers=['CPUExecutionProvider'])
    res6 = sess6.run(None, {'X': X_test[:3]})
    print("predicted labels", res6[0])
    print("predicted probabilies", res6[1])
    print("class labels", res6[2])


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    predicted labels [[ 14 114]
     [ 14 114]
     [ 14 114]]
    predicted probabilies [array([[4.3274203e-04, 1.8162604e-01, 8.1794125e-01],
           [7.1996351e-04, 3.5074779e-01, 6.4853227e-01],
           [2.2366168e-05, 4.4211719e-02, 9.5576590e-01]], dtype=float32), array([[1.4372485e-03, 2.3755349e-01, 6.5266651e-01, 1.0834280e-01],
           [1.7852996e-03, 3.8362366e-01, 4.7284558e-01, 1.4174546e-01],
           [2.8334750e-04, 1.5673934e-01, 7.3674095e-01, 1.0623635e-01]],
          dtype=float32)]
    class labels [array([10, 12, 14], dtype=int64), array([ 110,  112,  114, 1000], dtype=int64)]


.. GENERATED FROM PYTHON SOURCE LINES 191-192

**Versions used for this example**

.. GENERATED FROM PYTHON SOURCE LINES 192-198

.. code-block:: default


    print("numpy:", numpy.__version__)
    print("scikit-learn:", sklearn.__version__)
    print("onnx: ", onnx.__version__)
    print("onnxruntime: ", rt.__version__)
    print("skl2onnx: ", skl2onnx.__version__)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    numpy: 1.24.1
    scikit-learn: 1.2.0
    onnx:  1.13.0
    onnxruntime:  1.14.92+cpu
    skl2onnx:  1.13.1


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  0.926 seconds)


.. _sphx_glr_download_gyexamples_plot_dbegin_options_zipmap.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example


    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_dbegin_options_zipmap.py <plot_dbegin_options_zipmap.py>`

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_dbegin_options_zipmap.ipynb <plot_dbegin_options_zipmap.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_