module onnxrt.onnx_inference#

Inheritance diagram of mlprodict.onnxrt.onnx_inference

Short summary#

module mlprodict.onnxrt.onnx_inference

Implements a class able to compute the predictions from on an ONNX model.

source on GitHub



truncated documentation


Loads an ONNX file or object or stream. Computes the output of the ONNX graph. Several runtimes …



truncated documentation


Returns the names of all inputs. It does not include the optional inputs.


Returns the names and shapes of all inputs. This method assumes all inputs are tensors. It does not include …


Returns the names, shapes, types of all inputs. This method assumes all inputs are tensors. It does not …


Returns the list of optional inputs (the model has an initalizer of the same name as one input).


Returns the names of all outputs.


Returns the names and shapes of all outputs. This method assumes all inputs are tensors.


Returns the names, shapes, types of all outputs. This method assumes all inputs are tensors. It does not …

Static Methods#


truncated documentation




truncated documentation


Returns the ONNX verions of a node.


To pickle the object.





To unpickle the object.




Rewrite the run function in python, compiles it, and adds it as a method.


Looks into every node of the graph to see if there is a way to do the computation inplace. By default (input_inplace=False), …



Prepares the instance to deliver predictions.



Executes a compiled version of _run_sequence_runtime(), compiled with method _build_compile_run(). …



Set shapes based on shape inference relying on the runtime. The values are stored in every node.


Checks the output have the expected type. The function returns the list of mismatches.


Builds every possible ONNX file which computes one specific intermediate output from the inputs. …


Checks the model follow ONNX conventions.


Shows the sequence of nodes to run if runtime=='python'.


This function returns a dictionary {(kind, name): (order, op)}, name can be a node name or a result name. In …


Returns the profiling after a couple of execution.


Maps every name to one integer to avoid using dictionaries when running the predictions.


Computes expected shapes.


Reduces the memory footprint as much as possible.


Computes the predictions for this onnx graph.


Executes the graphs with the given inputs, then adds the intermediate results into ONNX nodes in the original graph. …


Infers the shape of the outputs with onnx package.


Switches all initializers to numpy.float64. If model is None, a simple cast is done. Otherwise, the function …


Produces a graph to facilitate the execution. One example:


Implements a class able to compute the predictions from on an ONNX model.

source on GitHub

class mlprodict.onnxrt.onnx_inference.OnnxInference(onnx_or_bytes_or_stream, runtime=None, skip_run=False, inplace=True, input_inplace=False, ir_version=None, target_opset=None, runtime_options=None, session_options=None, inside_loop=False, static_inputs=None, new_outputs=None, new_opset=None, existing_functions=None)#

Bases: object

Loads an ONNX file or object or stream. Computes the output of the ONNX graph. Several runtimes are available.

  • 'python': the runtime implements every onnx operator needed to run a scikit-learn model by using numpy or C++ code.

  • 'python_compiled': it is the same runtime than the previous one except every operator is called from a compiled function (_build_compile_run) instead for a method going through the list of operator

  • 'onnxruntime1': uses onnxruntime (or onnxruntime1-cuda, …)

  • 'onnxruntime2': this mode is mostly used to debug as python handles calling every operator but onnxruntime is called for every of them, this process may fail due to wrong inference type specially of the graph includes custom nodes, in that case, it is better to compute the output of intermediates nodes. It is much slower as fo every output, every node is computed but more robust.

  • onnx_or_bytes_or_streamonnx object, bytes, or filename or stream

  • runtime – runtime options

  • skip_run – do not build the runtime

  • inplace – use inplace computation as much as possible

  • input_inplace – the computation is allowed to overwrite the input, see _guess_inplace

  • ir_version – if not None, overwrite the default version

  • target_opset – used to overwrite target_opset

  • runtime_options – specific options for the runtime

  • inside_loop – tells the runtime the graph is meant to be repeated multiple times (in that case, inputs and outputs may share the same name)

  • static_inputs – Loop can use static variables, variables from the graph which runs the loop (enumerate of strings)

  • new_outputs – if the loading fails, it might worth cutting the graph, if not None, the graph will be cut to have these new_outputs as the final outputs

  • new_opset – overwrite the main opset and replaces by this new one

  • existing_functions – a model may contain several local functions, this parameter is used when a local function is calling another local function previously defined.

Among the possible runtime_options, there are: * enable_profiling: enables profiling for onnxruntime * session_options: an instance of SessionOptions from

  • ir_version: change ir_version

Changed in version 0.9: Parameters existing_functions was added. Removes device parameter. See runtime. Runtime onnxruntime1-cuda was added.

source on GitHub


Returns the ONNX verions of a node.

source on GitHub


To pickle the object.

source on GitHub

__init__(onnx_or_bytes_or_stream, runtime=None, skip_run=False, inplace=True, input_inplace=False, ir_version=None, target_opset=None, runtime_options=None, session_options=None, inside_loop=False, static_inputs=None, new_outputs=None, new_opset=None, existing_functions=None)#


source on GitHub


To unpickle the object.

source on GitHub



source on GitHub


Rewrite the run function in python, compiles it, and adds it as a method.


debug – insert debugging code


method name, callable object

Run a model with runtime ‘python_compiled’

The following code trains a model and compute the predictions with runtime 'python_compiled'. It converts the onnx graph into a python function which calls every operator. Its code is printed below.


import numpy
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from mlprodict.onnx_conv import to_onnx
from mlprodict.onnxrt import OnnxInference

iris = load_iris()
X, y =,
X_train, X_test, y_train, __ = train_test_split(X, y, random_state=11)
y_train = y_train.astype(numpy.float32)
clr = AdaBoostClassifier(
    n_estimators=3), y_train)

model_def = to_onnx(clr, X_train.astype(numpy.float32),

oinf2 = OnnxInference(model_def, runtime='python_compiled')
print({'X': X_test[:5]}))

# prints out the python function equivalent
# to the onnx graph


    somewhere/.local/lib/python3.9/site-packages/sklearn/ensemble/ FutureWarning: `base_estimator` was renamed to `estimator` in version 1.2 and will be removed in 1.4.
    {'output_label': array([2, 2, 1, 1, 2]), 'output_probability': []}
        def compiled_run(dict_inputs, yield_ops=None, context=None, attributes=None):
            if yield_ops is not None:
                raise NotImplementedError('yields_ops should be None.')
            # init: classes (classes)
            # init: clip_min (clip_min)
            # init: inverted_n_classes (inverted_n_classes)
            # init: mul_operand (mul_operand)
            # init: n_classes_minus_one (n_classes_minus_one)
            # init: shape_tensor (shape_tensor)
            # init: shape_tensor3 (shape_tensor3)
            # init: zero_scalar (zero_scalar)
            # inputs
            X = dict_inputs['X']
            (elab_name_1, eprob_name_1, ) = n0_treeensembleclassifier_1(X)
            (elab_name_0, eprob_name_0, ) = n1_treeensembleclassifier_1(X)
            (clipped_proba1, ) = n2_clip_11(eprob_name_1, clip_min)
            (clipped_proba, ) = n3_clip_11(eprob_name_0, clip_min)
            (log_proba1, ) = n4_log(clipped_proba1)
            (log_proba, ) = n5_log(clipped_proba)
            (reduced_proba1, ) = n6_reducesum_11(log_proba1)
            (reduced_proba, ) = n7_reducesum_11(log_proba)
            (reshaped_result1, ) = n8_reshape_5(reduced_proba1, shape_tensor)
            (reshaped_result, ) = n9_reshape_5(reduced_proba, shape_tensor)
            (prod_result1, ) = n10_mul(reshaped_result1, inverted_n_classes)
            (prod_result, ) = n11_mul(reshaped_result, inverted_n_classes)
            (sub_result1, ) = n12_sub(log_proba1, prod_result1)
            (sub_result, ) = n13_sub(log_proba, prod_result)
            (samme_proba, ) = n14_mul(sub_result, n_classes_minus_one)
            (samme_proba1, ) = n15_mul(sub_result1, n_classes_minus_one)
            (summation_prob, ) = n16_sum(samme_proba, samme_proba1)
            (div_result, ) = n17_div(summation_prob, n_classes_minus_one)
            (exp_operand, ) = n18_mul(div_result, mul_operand)
            (exp_result, ) = n19_exp(exp_operand)
            (reduced_exp_result, ) = n20_reducesum_11(exp_result)
            (normaliser, ) = n21_reshape_5(reduced_exp_result, shape_tensor)
            (cast_normaliser, ) = n22_cast(normaliser)
            (comparison_result, ) = n23_equal(cast_normaliser, zero_scalar)
            (cast_output, ) = n24_cast(comparison_result)
            (zero_filtered_normaliser, ) = n25_add(normaliser, cast_output)
            (probabilities, ) = n26_div(exp_result, zero_filtered_normaliser)
            (argmax_output, ) = n27_argmax_12(probabilities)
            (output_probability, ) = n28_zipmap(probabilities)
            (array_feature_extractor_result, ) = n29_arrayfeatureextractor(classes, argmax_output)
            (reshaped_result2, ) = n30_reshape_5(array_feature_extractor_result, shape_tensor3)
            (label, ) = n31_cast(reshaped_result2)
            (output_label, ) = n32_cast(label)
            return {
                'output_label': output_label,
                'output_probability': output_probability,

source on GitHub

static _get_type_property(info, prop)#

Looks into every node of the graph to see if there is a way to do the computation inplace. By default (input_inplace=False), the function assumes inputs cannot be modified so the first node cannot do inplace computation. This function only works with the python runtime.


input_inplace – the computation is allowed to overwrite the input

This function checks that one node is used only once and then can be modified by the next node. Nodes A, C can be overwritten by the computation. Node B cannot as it is used by two nodes.

It does not handle specific case such node B being overwritten by node C but without changing its shape and node D only needs the shape of B. Then B could be overwritten as well.

source on GitHub


Prepares the instance to deliver predictions.

source on GitHub

_run_sequence_runtime(inputs, clean_right_away=False, intermediate=False, verbose=0, node_time=False, overwrite_types=None, yield_ops=None, fLOG=None, context=None, attributes=None)#
_run_sequence_runtime_compiled(inputs, clean_right_away=False, intermediate=False, verbose=0, node_time=False, yield_ops=None, fLOG=None, context=None, attributes=None)#

Executes a compiled version of _run_sequence_runtime, compiled with method _build_compile_run. Every parameter with a default value is ignored. Switch to runtime='python' to enable those.

source on GitHub

_run_whole_runtime(inputs, clean_right_away=False, intermediate=False, verbose=0, node_time=False, overwrite_types=None, yield_ops=None, fLOG=None, context=None, attributes=None)#

Set shapes based on shape inference relying on the runtime. The values are stored in every node.

source on GitHub

_validate_outputs(res, verbose=0, fLOG=None)#

Checks the output have the expected type. The function returns the list of mismatches.

  • res – results in a dictionary

  • verbose – verbosity

  • fLOG – logging function



source on GitHub

build_intermediate(outputs=None, verbose=0, overwrite_types=None, fLOG=None)#

Builds every possible ONNX file which computes one specific intermediate output from the inputs.

  • outputs – subsets of outputs to get, None to get all outputs,

  • overwrite_types – shape inference does not work all the time, this allows to force types when building intermediate results, see select_model_inputs_outputs

  • verbose – displays intermediate information

  • fLOG – logging function



source on GitHub


Checks the model follow ONNX conventions.

source on GitHub


Shows the sequence of nodes to run if runtime=='python'.

source on GitHub


This function returns a dictionary {(kind, name): (order, op)}, name can be a node name or a result name. In that case, it gets the execution order than the node which created it. The function returns None if the order is not available (the selected runtime does not return it). kind is either ‘node’ or ‘node’. If two nodes have the same name, returned order is the last one. Initializers gets an execution order equal to -1, inputs to 0, all others results are >= 1.

New in version 0.7.

source on GitHub


Returns the profiling after a couple of execution.


as_df – return the results as a dataframe (True)


dataframe or list of dictionaries

New in version 0.6.

source on GitHub


Maps every name to one integer to avoid using dictionaries when running the predictions.


name – outputs name



source on GitHub


Computes expected shapes.


dictionary of shapes

source on GitHub

property input_names#

Returns the names of all inputs. It does not include the optional inputs.

Changed in version 0.6: The list does not include optional inputs anymore.

source on GitHub

property input_names_shapes#

Returns the names and shapes of all inputs. This method assumes all inputs are tensors. It does not include the optional inputs.

Changed in version 0.6: The list does not include optional inputs anymore.

source on GitHub

property input_names_shapes_types#

Returns the names, shapes, types of all inputs. This method assumes all inputs are tensors. It does not include the optional inputs.

Changed in version 0.6: The list does not include optional inputs anymore.

source on GitHub

property optional_inputs#

Returns the list of optional inputs (the model has an initalizer of the same name as one input).

source on GitHub

property output_names#

Returns the names of all outputs.

source on GitHub

property output_names_shapes#

Returns the names and shapes of all outputs. This method assumes all inputs are tensors.

source on GitHub

property output_names_shapes_types#

Returns the names, shapes, types of all outputs. This method assumes all inputs are tensors. It does not include the optional outputs.

source on GitHub


Reduces the memory footprint as much as possible.


pickable – keeps a pickle object?

source on GitHub

run(inputs, clean_right_away=False, intermediate=False, verbose=0, node_time=False, overwrite_types=None, yield_ops=None, fLOG=None, context=None, attributes=None)#

Computes the predictions for this onnx graph.

  • inputs – inputs as dictionary or a dataframe

  • clean_right_away – clean the intermediate outputs as soon as they are not needed

  • intermediate – returns a dictionary of intermediate variables instead of the results only

  • verbose – display information while predicting

  • node_time – measure time of each node

  • overwrite_types – shape inference does not work all the time, this allows to force types when building intermediate results, see select_model_inputs_outputs

  • yield_ops – dictionary to overwrite the output of operator YieldOp

  • fLOG – logging function if verbose > 0

  • context – local variables, needed when this object is a subgraph

  • attributes – this uses when this class runs a FunctionProto to store the values of the attributes of the function


outputs as dictionary and a second dictionary of the time spent in each node if node_time is True

Computes predictions with any runtime

The following example compares predictions between scikit-learn and this runtime for the python runtime.


import numpy
from sklearn.linear_model import LinearRegression
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from mlprodict.onnxrt import OnnxInference
from mlprodict.onnx_conv import to_onnx

iris = load_iris()
X, y =,
X_train, X_test, y_train, _ = train_test_split(X, y)
clr = LinearRegression(), y_train)

exp = clr.predict(X_test[:5])

model_def = to_onnx(clr, X_train.astype(numpy.float32),
oinf = OnnxInference(model_def)
y ={'X': X_test[:5]})


    [0.01  1.365 0.893 0.052 2.04 ]
    {'variable': array([[0.01 ],
           [2.04 ]])}

The function returns all intermediate outputs if intermediate is True. In case of runtime onnxruntime1, if intermediate is True, the first class builds all ONNX cut out to keep the one output and converted into OnnxInference.

Changed in version 0.9: Parameter attributes was added.

source on GitHub

run2onnx(inputs, verbose=0, fLOG=None, as_parameter=True, suffix='_DBG', param_name=None, node_type='DEBUG', domain='DEBUG', domain_opset=1, attributes=None)#

Executes the graphs with the given inputs, then adds the intermediate results into ONNX nodes in the original graph. Once saved, it can be looked with a tool such as netron.

  • inputs – inputs as dictionary or a dataframe

  • verbose – display information while predicting

  • fLOG – logging function if verbose > 0

  • as_parameter – add new nodes with results as one parameter (True) or as initializer (False)

  • suffix – suffix to add to new results

  • param_name – name of the parameter to add (by default the result name), it can be a function param_name(reult_name) -> parameter_name

  • node_type – type of the new node

  • domain – domain the new node

  • domain_opset – opset for domain

  • attributes – values for attributes if this class runs a FunctionProto


outputs as dictionary and the onnx graph with new nodes

The following example shows how to use it.

New in version 0.7.

source on GitHub


Infers the shape of the outputs with onnx package.


A new ONNX graph which defined outputs.

source on GitHub

switch_initializers_dtype(model=None, dtype_in=<class 'numpy.float32'>, dtype_out=<class 'numpy.float64'>)#

Switches all initializers to numpy.float64. If model is None, a simple cast is done. Otherwise, the function assumes the model is a scikit-learn pipeline. This only works if the runtime is 'python'.

  • modelscikit-learn model or None

  • dtype_in – previous type

  • dtype_out – next type


done operations

source on GitHub


Produces a graph to facilitate the execution.

One example:

Convert ONNX into graph

An example on how to convert an ONNX graph into a graph.


import pprint
import numpy
from mlprodict.npy.xop import loadop
from mlprodict.onnxrt import OnnxInference

OnnxAiOnnxMlLinearRegressor = loadop(
    ('', 'LinearRegressor'))

pars = dict(coefficients=numpy.array([1., 2.]),
onx = OnnxAiOnnxMlLinearRegressor(
    'X', output_names=['Y'], **pars)
model_def = onx.to_onnx(
    {'X': pars['coefficients'].astype(numpy.float32)},
    outputs={'Y': numpy.float32},
oinf = OnnxInference(model_def)


    {'attributes': {},
     'functions': {},
     'inits': {},
     'inputs': {'X': {'name': 'X',
                      'type': {'elem': 'float', 'kind': 'tensor', 'shape': (2,)}}},
     'intermediate': {'Y': None},
     'ir_version': 8,
     'nodes': {'_linearregressor': Onnx-LinearRegressor(X) -> Y    (name='_linearregressor')},
     'outputs': {'Y': {'name': 'Y',
                       'type': {'elem': 'float',
                                'kind': 'tensor',
                                'shape': ('?',)}}},
     'sequence': [Onnx-LinearRegressor(X) -> Y    (name='_linearregressor')],
     'statics': {},
     'targets': {'': 1}}

See an example of representation in notebook ONNX visualization.

source on GitHub


x.__contains__(y) <==> y in x.