.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "gyexamples/plot_ebegin_float_double.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_gyexamples_plot_ebegin_float_double.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_gyexamples_plot_ebegin_float_double.py:


.. _l-example-discrepencies-float-double:

Issues when switching to float
==============================

.. index:: float, double, discrepencies

Most models in :epkg:`scikit-learn` do computation with double,
not float. Most models in deep learning use float because
that's the most common situation with GPU. ONNX was initially
created to facilitate the deployment of deep learning models
and that explains why many converters assume the converted models
should use float. That assumption does not usually harm
the predictions, the conversion to float introduce small
discrepencies compare to double predictions.
That assumption is usually true if the prediction
function is continuous, :math:`y = f(x)`, then
:math:`dy = f'(x) dx`. We can determine an upper bound
to the discrepencies :
:math:`\Delta(y) \leqslant \sup_x \left\Vert f'(x)\right\Vert dx`.
*dx* is the discrepency introduced by a float conversion,
``dx = x - numpy.float32(x)``.

However, that's not the case for every model. A decision tree
trained for a regression is not a continuous function. Therefore,
even a small *dx* may introduce a huge discrepency. Let's look into
an example which always produces discrepencies and some ways
to overcome this situation.

.. contents::
    :local:

More into the issue
+++++++++++++++++++

The below example is built to fail.
It contains integer features with different order
of magnitude rounded to integer. A decision tree compares
features to thresholds. In most cases, float and double
comparison gives the same result. We denote
:math:`[x]_{f32}` the conversion (or cast)
``numpy.float32(x)``.

.. math::

    x \leqslant y = [x]_{f32} \leqslant [y]_{f32}

However, the probability that both comparisons give
different results is not null. The following graph shows
the discord areas.

.. GENERATED FROM PYTHON SOURCE LINES 53-107

.. code-block:: default

    from skl2onnx.sklapi import CastRegressor
    from mlprodict.onnxrt import OnnxInference
    from mlprodict.onnx_conv import to_onnx as to_onnx_extended
    from mlprodict.sklapi import OnnxPipeline
    from skl2onnx.sklapi import CastTransformer
    from skl2onnx import to_onnx
    from onnxruntime import InferenceSession
    from sklearn.model_selection import train_test_split
    from sklearn.tree import DecisionTreeRegressor
    from sklearn.preprocessing import StandardScaler
    from sklearn.pipeline import Pipeline
    from sklearn.datasets import make_regression
    import numpy
    import matplotlib.pyplot as plt


    def area_mismatch_rule(N, delta, factor, rule=None):
        if rule is None:
            rule = lambda t: numpy.float32(t)
        xst = []
        yst = []
        xsf = []
        ysf = []
        for x in range(-N, N):
            for y in range(-N, N):
                dx = (1. + x * delta) * factor
                dy = (1. + y * delta) * factor
                c1 = 1 if numpy.float64(dx) <= numpy.float64(dy) else 0
                c2 = 1 if numpy.float32(dx) <= rule(dy) else 0
                key = abs(c1 - c2)
                if key == 1:
                    xsf.append(dx)
                    ysf.append(dy)
                else:
                    xst.append(dx)
                    yst.append(dy)
        return xst, yst, xsf, ysf


    delta = 36e-10
    factor = 1
    xst, yst, xsf, ysf = area_mismatch_rule(100, delta, factor)


    fig, ax = plt.subplots(1, 1, figsize=(5, 5))
    ax.plot(xst, yst, '.', label="agree")
    ax.plot(xsf, ysf, '.', label="disagree")
    ax.set_title("Region where x <= y and (float)x <= (float)y agree")
    ax.set_xlabel("x")
    ax.set_ylabel("y")
    ax.plot([min(xst), max(xst)], [min(yst), max(yst)], 'k--')
    ax.legend()


.. image-sg:: /gyexamples/images/sphx_glr_plot_ebegin_float_double_001.png
   :alt: Region where x <= y and (float)x <= (float)y agree
   :srcset: /gyexamples/images/sphx_glr_plot_ebegin_float_double_001.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    <matplotlib.legend.Legend object at 0x7faf88262a90>


.. GENERATED FROM PYTHON SOURCE LINES 108-115

The pipeline and the data
+++++++++++++++++++++++++

We can now build an example where the learned decision tree
does many comparisons in this discord area. This is done
by rounding features to integers, a frequent case
happening when dealing with categorical features.

.. GENERATED FROM PYTHON SOURCE LINES 115-135

.. code-block:: default


    X, y = make_regression(10000, 10)
    X_train, X_test, y_train, y_test = train_test_split(X, y)

    Xi_train, yi_train = X_train.copy(), y_train.copy()
    Xi_test, yi_test = X_test.copy(), y_test.copy()
    for i in range(X.shape[1]):
        Xi_train[:, i] = (Xi_train[:, i] * 2 ** i).astype(numpy.int64)
        Xi_test[:, i] = (Xi_test[:, i] * 2 ** i).astype(numpy.int64)

    max_depth = 10

    model = Pipeline([
        ('scaler', StandardScaler()),
        ('dt', DecisionTreeRegressor(max_depth=max_depth))
    ])

    model.fit(Xi_train, yi_train)


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <style>#sk-container-id-10 {color: black;background-color: white;}#sk-container-id-10 pre{padding: 0;}#sk-container-id-10 div.sk-toggleable {background-color: white;}#sk-container-id-10 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-10 label.sk-toggleable__label-arrow:before {content: "▸";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-10 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-10 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-10 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-10 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-10 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-10 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: "▾";}#sk-container-id-10 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-10 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-10 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-10 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-10 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-10 div.sk-parallel-item::after {content: "";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-10 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-10 div.sk-serial::before {content: "";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-10 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-10 div.sk-item {position: relative;z-index: 1;}#sk-container-id-10 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-10 div.sk-item::before, #sk-container-id-10 div.sk-parallel-item::before {content: "";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-10 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-10 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-10 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-10 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-10 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-10 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-10 div.sk-label-container {text-align: center;}#sk-container-id-10 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-10 div.sk-text-repr-fallback {display: none;}</style><div id="sk-container-id-10" class="sk-top-container"><div class="sk-text-repr-fallback"><pre>Pipeline(steps=[(&#x27;scaler&#x27;, StandardScaler()),
                    (&#x27;dt&#x27;, DecisionTreeRegressor(max_depth=10))])</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class="sk-container" hidden><div class="sk-item sk-dashed-wrapped"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-29" type="checkbox" ><label for="sk-estimator-id-29" class="sk-toggleable__label sk-toggleable__label-arrow">Pipeline</label><div class="sk-toggleable__content"><pre>Pipeline(steps=[(&#x27;scaler&#x27;, StandardScaler()),
                    (&#x27;dt&#x27;, DecisionTreeRegressor(max_depth=10))])</pre></div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-30" type="checkbox" ><label for="sk-estimator-id-30" class="sk-toggleable__label sk-toggleable__label-arrow">StandardScaler</label><div class="sk-toggleable__content"><pre>StandardScaler()</pre></div></div></div><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-31" type="checkbox" ><label for="sk-estimator-id-31" class="sk-toggleable__label sk-toggleable__label-arrow">DecisionTreeRegressor</label><div class="sk-toggleable__content"><pre>DecisionTreeRegressor(max_depth=10)</pre></div></div></div></div></div></div></div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 136-142

The discrepencies
+++++++++++++++++

Let's reuse the function implemented in the
first example :ref:`l-diff-dicrepencies` and
look into the conversion.

.. GENERATED FROM PYTHON SOURCE LINES 142-164

.. code-block:: default


    def diff(p1, p2):
        p1 = p1.ravel()
        p2 = p2.ravel()
        d = numpy.abs(p2 - p1)
        return d.max(), (d / numpy.abs(p1)).max()


    onx = to_onnx(model, Xi_train[:1].astype(numpy.float32),
                  target_opset=17)

    sess = InferenceSession(onx.SerializeToString(),
                            providers=['CPUExecutionProvider'])

    X32 = Xi_test.astype(numpy.float32)

    skl = model.predict(X32)
    ort = sess.run(None, {'X': X32})[0]

    print(diff(skl, ort))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    (171.28042588123503, 1.176697659575544)


.. GENERATED FROM PYTHON SOURCE LINES 165-198

The discrepencies are significant.
The ONNX model keeps float at every step.

.. blockdiag::

   diagram {
     x_float32 -> normalizer -> y_float32 -> dtree -> z_float32
   }

In :epkg:`scikit-learn`:

.. blockdiag::

   diagram {
     x_float32 -> normalizer -> y_double -> dtree -> z_double
   }

CastTransformer
+++++++++++++++

We could try to use double everywhere. Unfortunately,
:epkg:`ONNX ML Operators` only allows float coefficients
for the operator *TreeEnsembleRegressor*. We may want
to compromise by casting the output of the normalizer into
float in the :epkg:`scikit-learn` pipeline.

.. blockdiag::

   diagram {
     x_float32 -> normalizer -> y_double ->
     cast -> y_float -> dtree -> z_float
   }


.. GENERATED FROM PYTHON SOURCE LINES 198-208

.. code-block:: default


    model2 = Pipeline([
        ('scaler', StandardScaler()),
        ('cast', CastTransformer()),
        ('dt', DecisionTreeRegressor(max_depth=max_depth))
    ])

    model2.fit(Xi_train, yi_train)


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <style>#sk-container-id-11 {color: black;background-color: white;}#sk-container-id-11 pre{padding: 0;}#sk-container-id-11 div.sk-toggleable {background-color: white;}#sk-container-id-11 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-11 label.sk-toggleable__label-arrow:before {content: "▸";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-11 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-11 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-11 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-11 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-11 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-11 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: "▾";}#sk-container-id-11 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-11 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-11 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-11 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-11 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-11 div.sk-parallel-item::after {content: "";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-11 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-11 div.sk-serial::before {content: "";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-11 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-11 div.sk-item {position: relative;z-index: 1;}#sk-container-id-11 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-11 div.sk-item::before, #sk-container-id-11 div.sk-parallel-item::before {content: "";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-11 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-11 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-11 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-11 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-11 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-11 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-11 div.sk-label-container {text-align: center;}#sk-container-id-11 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-11 div.sk-text-repr-fallback {display: none;}</style><div id="sk-container-id-11" class="sk-top-container"><div class="sk-text-repr-fallback"><pre>Pipeline(steps=[(&#x27;scaler&#x27;, StandardScaler()), (&#x27;cast&#x27;, CastTransformer()),
                    (&#x27;dt&#x27;, DecisionTreeRegressor(max_depth=10))])</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class="sk-container" hidden><div class="sk-item sk-dashed-wrapped"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-32" type="checkbox" ><label for="sk-estimator-id-32" class="sk-toggleable__label sk-toggleable__label-arrow">Pipeline</label><div class="sk-toggleable__content"><pre>Pipeline(steps=[(&#x27;scaler&#x27;, StandardScaler()), (&#x27;cast&#x27;, CastTransformer()),
                    (&#x27;dt&#x27;, DecisionTreeRegressor(max_depth=10))])</pre></div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-33" type="checkbox" ><label for="sk-estimator-id-33" class="sk-toggleable__label sk-toggleable__label-arrow">StandardScaler</label><div class="sk-toggleable__content"><pre>StandardScaler()</pre></div></div></div><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-34" type="checkbox" ><label for="sk-estimator-id-34" class="sk-toggleable__label sk-toggleable__label-arrow">CastTransformer</label><div class="sk-toggleable__content"><pre>CastTransformer()</pre></div></div></div><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-35" type="checkbox" ><label for="sk-estimator-id-35" class="sk-toggleable__label sk-toggleable__label-arrow">DecisionTreeRegressor</label><div class="sk-toggleable__content"><pre>DecisionTreeRegressor(max_depth=10)</pre></div></div></div></div></div></div></div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 209-210

The discrepencies.

.. GENERATED FROM PYTHON SOURCE LINES 210-222

.. code-block:: default


    onx2 = to_onnx(model2, Xi_train[:1].astype(numpy.float32),
                   target_opset=17)

    sess2 = InferenceSession(onx2.SerializeToString(),
                             providers=['CPUExecutionProvider'])

    skl2 = model2.predict(X32)
    ort2 = sess2.run(None, {'X': X32})[0]

    print(diff(skl2, ort2))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    (171.2804258812351, 1.176697659575544)


.. GENERATED FROM PYTHON SOURCE LINES 223-228

That still fails because the normalizer
in :epkg:`scikit-learn` and in :epkg:`ONNX`
use different types. The cast still happens and
the *dx* is still here. To remove it, we need to use
double in ONNX normalizer.

.. GENERATED FROM PYTHON SOURCE LINES 228-249

.. code-block:: default


    model3 = Pipeline([
        ('cast64', CastTransformer(dtype=numpy.float64)),
        ('scaler', StandardScaler()),
        ('cast', CastTransformer()),
        ('dt', DecisionTreeRegressor(max_depth=max_depth))
    ])

    model3.fit(Xi_train, yi_train)
    onx3 = to_onnx(model3, Xi_train[:1].astype(numpy.float32),
                   options={StandardScaler: {'div': 'div_cast'}},
                   target_opset=17)

    sess3 = InferenceSession(onx3.SerializeToString(),
                             providers=['CPUExecutionProvider'])

    skl3 = model3.predict(X32)
    ort3 = sess3.run(None, {'X': X32})[0]

    print(diff(skl3, ort3))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    (2.8991464660066413e-05, 5.806761687047564e-08)


.. GENERATED FROM PYTHON SOURCE LINES 250-273

It works. That also means that it is difficult to change
the computation type when a pipeline includes a discontinuous
function. It is better to keep the same types all along
before using a decision tree.

Sledgehammer
++++++++++++

The idea here is to always train the next step based
on ONNX outputs. That way, every step of the pipeline
is trained based on ONNX output.

* Trains the first step.
* Converts the step into ONNX
* Computes ONNX outputs.
* Trains the second step on these outputs.
* Converts the second step into ONNX.
* Merges it with the first step.
* Computes ONNX outputs of the merged two first steps.
* ...

It is implemented in
class :epkg:`OnnxPipeline`.

.. GENERATED FROM PYTHON SOURCE LINES 273-282

.. code-block:: default


    model_onx = OnnxPipeline([
        ('scaler', StandardScaler()),
        ('dt', DecisionTreeRegressor(max_depth=max_depth))
    ])

    model_onx.fit(Xi_train, yi_train)


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <style>#sk-container-id-12 {color: black;background-color: white;}#sk-container-id-12 pre{padding: 0;}#sk-container-id-12 div.sk-toggleable {background-color: white;}#sk-container-id-12 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-12 label.sk-toggleable__label-arrow:before {content: "▸";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-12 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-12 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-12 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-12 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-12 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-12 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: "▾";}#sk-container-id-12 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-12 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-12 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-12 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-12 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-12 div.sk-parallel-item::after {content: "";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-12 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-12 div.sk-serial::before {content: "";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-12 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-12 div.sk-item {position: relative;z-index: 1;}#sk-container-id-12 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-12 div.sk-item::before, #sk-container-id-12 div.sk-parallel-item::before {content: "";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-12 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-12 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-12 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-12 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-12 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-12 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-12 div.sk-label-container {text-align: center;}#sk-container-id-12 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-12 div.sk-text-repr-fallback {display: none;}</style><div id="sk-container-id-12" class="sk-top-container"><div class="sk-text-repr-fallback"><pre>OnnxPipeline(steps=[(&#x27;scaler&#x27;,
                         OnnxTransformer(onnx_bytes=b&#x27;\x08\x08\x12\x08skl2onnx\x1a\x061.13.1&quot;\x07ai.onnx(\x002\x00:\xf6\x01\n\xa6\x01\n\x01X\x12\x08variable\x1a\x06Scaler&quot;\x06Scaler*=\n\x06offset=!\xfaH;=\xacA\x05\xbd=\x1a\xe5\x86\xbd=\xcd\xa7!\xbc=%+,&gt;=\x89\x8b\xb6&gt;=\x7f\xd9\xbd\xbf=\xfe\xf9c\xc0=\x92\xf0*\xbd=\x1e\xcc\xb3\xbf\xa0\x01\x06*&lt;\n...x06&gt;=\x1d\x0f\x85==\x12\xd6\x01==c\x8c\x80&lt;=\xc2u\x00&lt;=&lt;J\x81;=7H\x01;\xa0\x01\x06:\nai.onnx.ml\x12\x1emlprodict_ONNX(StandardScaler)Z\x11\n\x01X\x12\x0c\n\n\x08\x01\x12\x06\n\x00\n\x02\x08\nb\x18\n\x08variable\x12\x0c\n\n\x08\x01\x12\x06\n\x00\n\x02\x08\nB\x0e\n\nai.onnx.ml\x10\x01B\x04\n\x00\x10\x11&#x27;)),
                        (&#x27;dt&#x27;, DecisionTreeRegressor(max_depth=10))])</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class="sk-container" hidden><div class="sk-item sk-dashed-wrapped"><div class="sk-label-container"><div class="sk-label sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-36" type="checkbox" ><label for="sk-estimator-id-36" class="sk-toggleable__label sk-toggleable__label-arrow">OnnxPipeline</label><div class="sk-toggleable__content"><pre>OnnxPipeline(steps=[(&#x27;scaler&#x27;,
                         OnnxTransformer(onnx_bytes=b&#x27;\x08\x08\x12\x08skl2onnx\x1a\x061.13.1&quot;\x07ai.onnx(\x002\x00:\xf6\x01\n\xa6\x01\n\x01X\x12\x08variable\x1a\x06Scaler&quot;\x06Scaler*=\n\x06offset=!\xfaH;=\xacA\x05\xbd=\x1a\xe5\x86\xbd=\xcd\xa7!\xbc=%+,&gt;=\x89\x8b\xb6&gt;=\x7f\xd9\xbd\xbf=\xfe\xf9c\xc0=\x92\xf0*\xbd=\x1e\xcc\xb3\xbf\xa0\x01\x06*&lt;\n...x06&gt;=\x1d\x0f\x85==\x12\xd6\x01==c\x8c\x80&lt;=\xc2u\x00&lt;=&lt;J\x81;=7H\x01;\xa0\x01\x06:\nai.onnx.ml\x12\x1emlprodict_ONNX(StandardScaler)Z\x11\n\x01X\x12\x0c\n\n\x08\x01\x12\x06\n\x00\n\x02\x08\nb\x18\n\x08variable\x12\x0c\n\n\x08\x01\x12\x06\n\x00\n\x02\x08\nB\x0e\n\nai.onnx.ml\x10\x01B\x04\n\x00\x10\x11&#x27;)),
                        (&#x27;dt&#x27;, DecisionTreeRegressor(max_depth=10))])</pre></div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-37" type="checkbox" ><label for="sk-estimator-id-37" class="sk-toggleable__label sk-toggleable__label-arrow">OnnxTransformer</label><div class="sk-toggleable__content"><pre>OnnxTransformer(onnx_bytes=b&#x27;\x08\x08\x12\x08skl2on...ml\x10\x01B\x04\n\x00\x10\x11&#x27;, output_name=None, enforce_float32=True, runtime=&#x27;python&#x27;)</pre></div></div></div><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-38" type="checkbox" ><label for="sk-estimator-id-38" class="sk-toggleable__label sk-toggleable__label-arrow">DecisionTreeRegressor</label><div class="sk-toggleable__content"><pre>DecisionTreeRegressor(max_depth=10)</pre></div></div></div></div></div></div></div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 283-284

The conversion.

.. GENERATED FROM PYTHON SOURCE LINES 284-296

.. code-block:: default


    onx4 = to_onnx(model_onx, Xi_train[:1].astype(numpy.float32),
                   target_opset=17)

    sess4 = InferenceSession(onx4.SerializeToString(),
                             providers=['CPUExecutionProvider'])

    skl4 = model_onx.predict(X32)
    ort4 = sess4.run(None, {'X': X32})[0]

    print(diff(skl4, ort4))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    (2.8991464660066413e-05, 5.806761687047564e-08)


.. GENERATED FROM PYTHON SOURCE LINES 297-298

It works too in a more simple way.

.. GENERATED FROM PYTHON SOURCE LINES 300-320

No discrepencies at all?
++++++++++++++++++++++++

Is it possible to get no error at all?
There is one major obstacle: :epkg:`scikit-learn`
stores the predicted values in every leave with double
(`_tree.pyx - _get_value_ndarray
<https://github.com/scikit-learn/scikit-learn/blob/master/
sklearn/tree/_tree.pyx#L1096>`_), :epkg:`ONNX` defines the
the predicted values as floats: :epkg:`TreeEnsembleRegressor`.
What can we do to solve it?
What if we could extend ONNX specifications to support
double instead of floats.
We reuse what was developped in example
`Other way to convert <http://www.xavierdupre.fr/app/
mlprodict/helpsphinx/notebooks/onnx_discrepencies.html
?highlight=treeensembleregressordouble#other-way-to-convert>`_
and a custom ONNX node `TreeEnsembleRegressorDouble
<http://www.xavierdupre.fr/app/mlprodict/helpsphinx/api/onnxrt_ops.html
?highlight=treeensembleregressordouble#treeensembleregressordouble>`_.

.. GENERATED FROM PYTHON SOURCE LINES 320-331

.. code-block:: default


    tree = DecisionTreeRegressor(max_depth=max_depth)
    tree.fit(Xi_train, yi_train)

    model_onx = to_onnx_extended(tree, Xi_train[:1].astype(numpy.float64),
                                 rewrite_ops=True, target_opset=17)

    oinf5 = OnnxInference(model_onx, runtime='python_compiled')
    print(oinf5)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    OnnxInference(...)
        def compiled_run(dict_inputs, yield_ops=None, context=None, attributes=None):
            if yield_ops is not None:
                raise NotImplementedError('yields_ops should be None.')
            # inputs
            X = dict_inputs['X']
            (tree_ensemble_cast, ) = n0_treeensembleregressor_3(X)
            (variable, ) = n1_cast(tree_ensemble_cast)
            return {
                'variable': variable,
            }


.. GENERATED FROM PYTHON SOURCE LINES 332-333

Let's measure the discrepencies.

.. GENERATED FROM PYTHON SOURCE LINES 333-338

.. code-block:: default


    X64 = Xi_test.astype(numpy.float64)
    skl5 = tree.predict(X64)
    ort5 = oinf5.run({'X': X64})['variable']


.. GENERATED FROM PYTHON SOURCE LINES 339-340

Perfect, no discrepencies at all.

.. GENERATED FROM PYTHON SOURCE LINES 340-343

.. code-block:: default


    print(diff(skl5, ort5))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    (0.0, 0.0)


.. GENERATED FROM PYTHON SOURCE LINES 344-352

CastRegressor
+++++++++++++

The previous example demonstrated the type difference for
the predicted values explains the small differences between
:epkg:`scikit-learn` and :epkg:`onnxruntime`. But it does not
with the current ONNX. Another option is to cast the
the predictions into floats in the :epkg:`scikit-learn` pipeline.

.. GENERATED FROM PYTHON SOURCE LINES 352-368

.. code-block:: default


    ctree = CastRegressor(DecisionTreeRegressor(max_depth=max_depth))
    ctree.fit(Xi_train, yi_train)

    onx6 = to_onnx(ctree, Xi_train[:1].astype(numpy.float32),
                   target_opset=17)

    sess6 = InferenceSession(onx6.SerializeToString(),
                             providers=['CPUExecutionProvider'])

    skl6 = ctree.predict(X32)
    ort6 = sess6.run(None, {'X': X32})[0]

    print(diff(skl6, ort6))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    (0.0, 0.0)


.. GENERATED FROM PYTHON SOURCE LINES 369-370

Success!


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  3.212 seconds)


.. _sphx_glr_download_gyexamples_plot_ebegin_float_double.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example


    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_ebegin_float_double.py <plot_ebegin_float_double.py>`

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_ebegin_float_double.ipynb <plot_ebegin_float_double.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_