module onnx_conv.convert
#
Short summary#
module mlprodict.onnx_conv.convert
Overloads a conversion function.
Classes#
class |
truncated documentation |
---|---|
Functions#
function |
truncated documentation |
---|---|
Converts a scorer into ONNX assuming there exists a converter associated to it. The function wraps the function … |
|
Returns a tuples (variable index, column index in that variable). The function has two different behaviours, one when … |
|
Returns the requested graph inpudes based on their indices or names. See |
|
Produces input data for onnx runtime. |
|
Retrieves all the parameters of a scikit-learn model. |
|
Guesses initial types from an array or a dataframe. |
|
Guesses initial types from a dataset. |
|
Guesses initial types from a model. |
|
Converts a model using on sklearn-onnx. |
|
Converts a model using on sklearn-onnx. The functions works as the same as function |
Methods#
method |
truncated documentation |
---|---|
Documentation#
Overloads a conversion function.
- class mlprodict.onnx_conv.convert._ParamEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)#
Bases:
JSONEncoder
Constructor for JSONEncoder, with sensible defaults.
If skipkeys is false, then it is a TypeError to attempt encoding of keys that are not str, int, float or None. If skipkeys is True, such items are simply skipped.
If ensure_ascii is true, the output is guaranteed to be str objects with all incoming non-ASCII characters escaped. If ensure_ascii is false, the output can contain non-ASCII characters.
If check_circular is true, then lists, dicts, and custom encoded objects will be checked for circular references during encoding to prevent an infinite recursion (which would cause an OverflowError). Otherwise, no such check takes place.
If allow_nan is true, then NaN, Infinity, and -Infinity will be encoded as such. This behavior is not JSON specification compliant, but is consistent with most JavaScript based encoders and decoders. Otherwise, it will be a ValueError to encode such floats.
If sort_keys is true, then the output of dictionaries will be sorted by key; this is useful for regression tests to ensure that JSON serializations can be compared on a day-to-day basis.
If indent is a non-negative integer, then JSON array elements and object members will be pretty-printed with that indent level. An indent level of 0 will only insert newlines. None is the most compact representation.
If specified, separators should be an (item_separator, key_separator) tuple. The default is (’, ‘, ‘: ‘) if indent is
None
and (‘,’, ‘: ‘) otherwise. To get the most compact JSON representation, you should specify (‘,’, ‘:’) to eliminate whitespace.If specified, default is a function that gets called for objects that can’t otherwise be serialized. It should return a JSON encodable version of the object or raise a
TypeError
.- default(obj)#
Implement this method in a subclass such that it returns a serializable object for
o
, or calls the base implementation (to raise aTypeError
).For example, to support arbitrary iterators, you could implement default like this:
def default(self, o): try: iterable = iter(o) except TypeError: pass else: return list(iterable) # Let the base class default method raise the TypeError return JSONEncoder.default(self, o)
- mlprodict.onnx_conv.convert._fix_opset_skl2onnx()#
- mlprodict.onnx_conv.convert._guess_s2o_type(vtype: ValueInfoProto)#
- mlprodict.onnx_conv.convert._guess_type_(X, itype, dtype)#
- mlprodict.onnx_conv.convert._merge_initial_types(i_types, transform_inputs, merge)#
- mlprodict.onnx_conv.convert._new_options(options, prefix, sklop)#
- mlprodict.onnx_conv.convert._replace_tensor_type(schema, tensor_type)#
- mlprodict.onnx_conv.convert._to_onnx_function_column_transformer(model, X=None, name=None, initial_types=None, target_opset=None, options=None, rewrite_ops=False, white_op=None, black_op=None, final_types=None, rename_strategy=None, verbose=0, prefix_name=None, run_shape=False, single_function=True)#
- mlprodict.onnx_conv.convert._to_onnx_function_pipeline(model, X=None, name=None, initial_types=None, target_opset=None, options=None, rewrite_ops=False, white_op=None, black_op=None, final_types=None, rename_strategy=None, verbose=0, prefix_name=None, run_shape=False, single_function=True)#
- mlprodict.onnx_conv.convert.convert_scorer(fct, initial_types, name=None, target_opset=None, options=None, custom_conversion_functions=None, custom_shape_calculators=None, custom_parsers=None, white_op=None, black_op=None, final_types=None, verbose=0)#
Converts a scorer into ONNX assuming there exists a converter associated to it. The function wraps the function into a custom transformer, then calls function convert_sklearn from sklearn-onnx.
- Parameters:
fct – function to convert (or a scorer from scikit-learn)
initial_types – types information
name – name of the produced model
target_opset – to do it with a different target opset
options – additional parameters for the conversion
custom_conversion_functions – a dictionary for specifying the user customized conversion function, it takes precedence over registered converters
custom_shape_calculators – a dictionary for specifying the user customized shape calculator it takes precedence over registered shape calculators.
custom_parsers – parsers determine which outputs is expected for which particular task, default parsers are defined for classifiers, regressors, pipeline but they can be rewritten, custom_parsers is a dictionary
{ type: fct_parser(scope, model, inputs, custom_parsers=None) }
white_op – white list of ONNX nodes allowed while converting a pipeline, if empty, all are allowed
black_op – black list of ONNX nodes allowed while converting a pipeline, if empty, none are blacklisted
final_types – a python list. Works the same way as initial_types but not mandatory, it is used to overwrites the type (if type is not None) and the name of every output.
verbose – displays information while converting
- Returns:
ONNX graph
- mlprodict.onnx_conv.convert.get_column_index(i, inputs)#
Returns a tuples (variable index, column index in that variable). The function has two different behaviours, one when i (column index) is an integer, another one when i is a string (column name). If i is a string, the function looks for input name with this name and returns (index, 0). If i is an integer, let’s assume first we have two inputs I0 = FloatTensorType([None, 2]) and I1 = FloatTensorType([None, 3]), in this case, here are the results:
get_column_index(0, inputs) -> (0, 0) get_column_index(1, inputs) -> (0, 1) get_column_index(2, inputs) -> (1, 0) get_column_index(3, inputs) -> (1, 1) get_column_index(4, inputs) -> (1, 2)
- mlprodict.onnx_conv.convert.get_column_indices(indices, inputs, multiple)#
Returns the requested graph inpudes based on their indices or names. See
get_column_index
.- Parameters:
indices – variables indices or names
inputs – graph inputs
multiple – allows column to come from multiple variables
- Returns:
a tuple (variable name, list of requested indices) if multiple is False, a dictionary { var_index: [ list of requested indices ] } if multiple is True
- mlprodict.onnx_conv.convert.get_inputs_from_data(X, schema=None)#
Produces input data for onnx runtime.
- Parameters:
X – data
schema – schema if None, schema is guessed with
guess_schema_from_data
- Returns:
input data
- mlprodict.onnx_conv.convert.get_sklearn_json_params(model)#
Retrieves all the parameters of a scikit-learn model.
- mlprodict.onnx_conv.convert.guess_initial_types(X, initial_types)#
Guesses initial types from an array or a dataframe.
- Parameters:
X – array or dataframe
initial_types – hints about X
- Returns:
data types
- mlprodict.onnx_conv.convert.guess_schema_from_data(X, tensor_type=None, schema=None)#
Guesses initial types from a dataset.
- Parameters:
X – dataset (dataframe, array)
tensor_type – if not None, replaces every FloatTensorType or DoubleTensorType by this one
schema – known schema
- Returns:
schema (list of typed and named columns)
- mlprodict.onnx_conv.convert.guess_schema_from_model(model, tensor_type=None, schema=None)#
Guesses initial types from a model.
- Parameters:
model – model
tensor_type – if not None, replaces every FloatTensorType or DoubleTensorType by this one
schema – known schema
- Returns:
schema (list of typed and named columns)
- mlprodict.onnx_conv.convert.to_onnx(model, X=None, name=None, initial_types=None, target_opset=None, options=None, rewrite_ops=False, white_op=None, black_op=None, final_types=None, rename_strategy=None, verbose=0, as_function=False, prefix_name=None, run_shape=False, single_function=True)#
Converts a model using on sklearn-onnx.
- Parameters:
model – model to convert or a function wrapped into _PredictScorer with function make_scorer
X – training set (at least one row), can be None, it is used to infered the input types (initial_types)
initial_types – if X is None, then initial_types must be defined
name – name of the produced model
target_opset – to do it with a different target opset
options – additional parameters for the conversion
rewrite_ops – rewrites some existing converters, the changes are permanent
white_op – white list of ONNX nodes allowed while converting a pipeline, if empty, all are allowed
black_op – black list of ONNX nodes allowed while converting a pipeline, if empty, none are blacklisted
final_types – a python list. Works the same way as initial_types but not mandatory, it is used to overwrites the type (if type is not None) and the name of every output.
rename_strategy – rename any name in the graph, select shorter names, see
onnx_rename_names
verbose – display information while converting the model
as_function – exposes every model in a pipeline as a function, the main graph contains the pipeline structure, see Use function when converting into ONNX for an example
prefix_name – used if as_function is True, to give a prefix to variable in a pipeline
run_shape – run shape inference
single_function – if as_function is True, the function returns one graph with one call to the main function if single_function is True or a list of node corresponding to the graph structure
- Returns:
converted model
The function rewrites function to_onnx from sklearn-onnx but may changes a few converters if rewrite_ops is True. For example, ONNX only supports TreeEnsembleRegressor for float but not for double. It becomes available if
rewrite_ops=True
.How to deal with a dataframe as input?
Each column of the dataframe is considered as an named input. The first step is to make sure that every column type is correct. pandas tends to select the least generic type to hold the content of one column. ONNX does not automatically cast the data it receives. The data must have the same type with the model is converted and when the converted model receives the data to predict.
<<<
from io import StringIO from textwrap import dedent import numpy import pandas from pyquickhelper.pycode import ExtTestCase from sklearn.preprocessing import OneHotEncoder from sklearn.pipeline import Pipeline from sklearn.compose import ColumnTransformer from mlprodict.onnx_conv import to_onnx from mlprodict.onnxrt import OnnxInference text = dedent(''' __SCHEMA__ 7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5,red 7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,5,red 7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,5,red 11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,6,red ''') text = text.replace( "__SCHEMA__", "fixed_acidity,volatile_acidity,citric_acid,residual_sugar,chlorides," "free_sulfur_dioxide,total_sulfur_dioxide,density,pH,sulphates," "alcohol,quality,color") X_train = pandas.read_csv(StringIO(text)) for c in X_train.columns: if c != 'color': X_train[c] = X_train[c].astype(numpy.float32) numeric_features = [c for c in X_train if c != 'color'] pipe = Pipeline([ ("prep", ColumnTransformer([ ("color", Pipeline([ ('one', OneHotEncoder()), ('select', ColumnTransformer( [('sel1', 'passthrough', [0])])) ]), ['color']), ("others", "passthrough", numeric_features) ])), ]) pipe.fit(X_train) pred = pipe.transform(X_train) print(pred) model_onnx = to_onnx(pipe, X_train, target_opset=12) oinf = OnnxInference(model_onnx) # The dataframe is converted into a dictionary, # each key is a column name, each value is a numpy array. inputs = {c: X_train[c].values for c in X_train.columns} inputs = {c: v.reshape((v.shape[0], 1)) for c, v in inputs.items()} onxp = oinf.run(inputs) print(onxp)
>>>
[[1.000e+00 7.400e+00 7.000e-01 0.000e+00 1.900e+00 7.600e-02 1.100e+01 3.400e+01 9.978e-01 3.510e+00 5.600e-01 9.400e+00 5.000e+00] [1.000e+00 7.800e+00 8.800e-01 0.000e+00 2.600e+00 9.800e-02 2.500e+01 6.700e+01 9.968e-01 3.200e+00 6.800e-01 9.800e+00 5.000e+00] [1.000e+00 7.800e+00 7.600e-01 4.000e-02 2.300e+00 9.200e-02 1.500e+01 5.400e+01 9.970e-01 3.260e+00 6.500e-01 9.800e+00 5.000e+00] [1.000e+00 1.120e+01 2.800e-01 5.600e-01 1.900e+00 7.500e-02 1.700e+01 6.000e+01 9.980e-01 3.160e+00 5.800e-01 9.800e+00 6.000e+00]] {'transformed_column': array([[1.000e+00, 7.400e+00, 7.000e-01, 0.000e+00, 1.900e+00, 7.600e-02, 1.100e+01, 3.400e+01, 9.978e-01, 3.510e+00, 5.600e-01, 9.400e+00, 5.000e+00], [1.000e+00, 7.800e+00, 8.800e-01, 0.000e+00, 2.600e+00, 9.800e-02, 2.500e+01, 6.700e+01, 9.968e-01, 3.200e+00, 6.800e-01, 9.800e+00, 5.000e+00], [1.000e+00, 7.800e+00, 7.600e-01, 4.000e-02, 2.300e+00, 9.200e-02, 1.500e+01, 5.400e+01, 9.970e-01, 3.260e+00, 6.500e-01, 9.800e+00, 5.000e+00], [1.000e+00, 1.120e+01, 2.800e-01, 5.600e-01, 1.900e+00, 7.500e-02, 1.700e+01, 6.000e+01, 9.980e-01, 3.160e+00, 5.800e-01, 9.800e+00, 6.000e+00]], dtype=float32)}
Changed in version 0.9: Parameter as_function was added.
- mlprodict.onnx_conv.convert.to_onnx_function(model, X=None, name=None, initial_types=None, target_opset=None, options=None, rewrite_ops=False, white_op=None, black_op=None, final_types=None, rename_strategy=None, verbose=0, prefix_name=None, run_shape=False, single_function=True)#
Converts a model using on sklearn-onnx. The functions works as the same as function
to_onnx
but every model is exported as a single function and the main graph represents the pipeline structure.- Parameters:
model – model to convert or a function wrapped into _PredictScorer with function make_scorer
X – training set (at least one row), can be None, it is used to infered the input types (initial_types)
initial_types – if X is None, then initial_types must be defined
name – name of the produced model
target_opset – to do it with a different target opset
options – additional parameters for the conversion
rewrite_ops – rewrites some existing converters, the changes are permanent
white_op – white list of ONNX nodes allowed while converting a pipeline, if empty, all are allowed
black_op – black list of ONNX nodes allowed while converting a pipeline, if empty, none are blacklisted
final_types – a python list. Works the same way as initial_types but not mandatory, it is used to overwrites the type (if type is not None) and the name of every output.
rename_strategy – rename any name in the graph, select shorter names, see
onnx_rename_names
verbose – display information while converting the model
prefix_name – prefix for variable names
run_shape – run shape inference on the final onnx model
single_function – if True, the main graph only includes one node calling the main function
- Returns:
converted model