module testing.einsum.einsum_fct
#
Short summary#
module mlprodict.testing.einsum.einsum_fct
Main functions decomposing einsum computation into more simple functions.
Classes#
class |
truncated documentation |
---|---|
Stores all the necessary information to cache the preprocessing of a an einsum equation. |
Functions#
function |
truncated documentation |
---|---|
Proposes a new implementation of numpy.einsum. It does not allow expresion using … and expects a right … |
|
Enumerates all cached einsum function. |
|
Proposes a new implementation of numpy.einsum. It does not allow expresion using … and expects a right … |
Static Methods#
staticmethod |
truncated documentation |
---|---|
Creates an instance of CachedEinsum. |
Methods#
method |
truncated documentation |
---|---|
Calls the runtime self.runtime_. |
|
usual |
|
Preprocesses the equation builds whatever is necessary to compute the result of the einsum equation. |
|
Builds an ONNX graph with a single einsum operator. |
|
Builds the runtime associated to the equation self.equation_. |
|
Returns default inputs (reshaped numpy.arange + 0.7i). |
Documentation#
Main functions decomposing einsum computation into more simple functions.
- class mlprodict.testing.einsum.einsum_fct.CachedEinsum(equation, runtime='batch_dot', opset=None, optimize=False, dtype=<class 'numpy.float64'>, decompose=True, strategy=None, verbose=None, key=None)#
Bases:
object
Stores all the necessary information to cache the preprocessing of a an einsum equation.
- Parameters:
equation – numpy equation
runtime – see
einsum
opset – ONNX opset
optimize – finds the best letter permutation
dtype – dtype
decompose – to decompose Einsum operator or to keep it as is
key – key used to cache this class
strategy – optimization strategy
verbose – displays progress information
The class creates the following attributes:
equation_ corresponding to the best equivalent equation
- graph_: the corresponding graph returned by function
:func:`decompose_einsum_equation <mlprodict.testing.einsum.einsum_impl.decompose_einsum_equation> `
onnx_: if a conversion to onnx is used, stores the onnx graph
runtime_: a function used by __call__, calls the runtime
- __call__(*inputs)#
Calls the runtime self.runtime_.
- __init__(equation, runtime='batch_dot', opset=None, optimize=False, dtype=<class 'numpy.float64'>, decompose=True, strategy=None, verbose=None, key=None)#
- __repr__()#
usual
- _build_optimize()#
- _build_optimize_ml()#
- build()#
Preprocesses the equation builds whatever is necessary to compute the result of the einsum equation.
- static build_einsum(equation, runtime, opset, optimize, dtype, decompose=True, strategy=None, verbose=None, key=None)#
Creates an instance of CachedEinsum.
- build_onnx_einsum(input_names)#
Builds an ONNX graph with a single einsum operator.
- build_runtime()#
Builds the runtime associated to the equation self.equation_.
- default_inputs(N=None)#
Returns default inputs (reshaped numpy.arange + 0.7i).
- Parameters:
N – dimension (all dimension have the same size)
If N is None, N is given a size depending on the number of letters to avoid spending too much time on optimization.
- mlprodict.testing.einsum.einsum_fct._einsum(equation, dtype, optimize=False, runtime='batch_dot', cache=True, opset=None, decompose=True, strategy=None, verbose=None)#
- mlprodict.testing.einsum.einsum_fct.einsum(equation, *inputs, optimize=False, runtime='batch_dot', cache=True, opset=None, decompose=True, strategy=None, verbose=None)#
Proposes a new implementation of numpy.einsum. It does not allow expresion using … and expects a right member.
- Parameters:
equation – einsum equation
inputs – inputs
optimize – permutes all letters to find the best permutation
runtime – runtime used to compute the results once the computation graph is produced (see below)
cache – if True, the function stores the preprocessing done for a specific equation, the second call with the same equation is much faster
opset – ONNX opset to use for some runtimes
decompose – by default, the function decomposes the equation into more simple operators but it can keep the original ONNX einsum operator.
strategy – optimisation strategy (see below)
verbose – display progress if optimize is True
- Returns:
einsum result
The available runtimes are:
batch_dot: the runtime is
apply_einsum_sequence
,python: one ONNX graph executed with a python runtime,
onnxruntime1: one ONNX graph executed with onnxruntime.
The optimisation strategy can be:
None: the same runtime is used to find the best permutation of letters
- ‘ml’: a machine learned model is used to predict the
best permutation of letters, this model comes from notebook Infer operator computation cost.
The function works in two steps:
first step analyses the equation to produce a computation graph, this graph can also be converted into ONNX,
second step runs the graph whatever the graph is.
Further details are available in the documentation of function
optimize_decompose_einsum_equation
. The function works the same way as numpy.einsum:<<<
import numpy from mlprodict.testing.einsum import einsum equation = "abc,cd->abd" m1 = numpy.random.randn(2, 2, 2) m2 = numpy.random.randn(2, 2) np = numpy.einsum(equation, m1, m2) print('numpy.einsum') print(np) print('mlprodict.testing.einsum') mp = einsum(equation, m1, m2) print(mp)
>>>
numpy.einsum [[[-2.188 0.692] [-1.017 0.352]] [[-1.125 -0.248] [-0.167 0.136]]] mlprodict.testing.einsum [[[-2.188 0.692] [-1.017 0.352]] [[-1.125 -0.248] [-0.167 0.136]]]
In some case, the einsum implementation can be optimized by looping on possible permutation:
<<<
import timeit import numpy from mlprodict.testing.einsum import einsum from mlprodict.testing.einsum.einsum_fct import enumerate_cached_einsum equation = "cab,cd->ad" m1 = numpy.random.randn(20, 20, 20) m2 = numpy.random.randn(20, 20) print('numpy.einsum', timeit.timeit('numpy.einsum(equation, m1, m2)', number=200, globals=globals())) einsum(equation, m1, m2) print('einsum', timeit.timeit('einsum(equation, m1, m2)', number=200, globals=globals())) einsum(equation, m1, m2, runtime='python') print('einsum-python', timeit.timeit('einsum(equation, m1, m2, runtime="python")', number=200, globals=globals())) einsum(equation, m1, m2, runtime='onnxruntime1') print('einsum-onnxruntime1', timeit.timeit('einsum(equation, m1, m2, runtime="onnxruntime1")', number=200, globals=globals())) einsum(equation, m1, m2, runtime='onnxruntime1', optimize=True, verbose=1) print('einsum-onnxruntime1', timeit.timeit('einsum(equation, m1, m2, runtime="onnxruntime1", optimize=True)', number=200, globals=globals())) print("list of cached einsum equations") for k, v in enumerate_cached_einsum(): print(k, v.equation, v.equation_)
>>>
numpy.einsum 0.13381517003290355 einsum 0.1363776430953294 einsum-python 0.23073153698351234 einsum-onnxruntime1 0.33073955099098384 einsum-onnxruntime1 0.32155248697381467 list of cached einsum equations ('cab,cd->ad', 'batch_dot', None, False, dtype('float64'), True, None) cab,cd->ad cab,cd->ad ('cab,cd->ad', 'python', None, False, dtype('float64'), True, None) cab,cd->ad cab,cd->ad ('cab,cd->ad', 'onnxruntime1', None, False, dtype('float64'), True, None) cab,cd->ad cab,cd->ad ('cab,cd->ad', 'onnxruntime1', None, True, dtype('float64'), True, None) cab,cd->ad acd,ab->cb [runpythonerror] 0%| | 0/25 [00:00<?, ?it/s] 0.017 rtbest='cab,cd->ad': 0%| | 0/25 [00:00<?, ?it/s] 0.016 rtbest='dab,dc->ac': 0%| | 0/25 [00:00<?, ?it/s] 0.016 rtbest='dab,dc->ac': 12%|█▏ | 3/25 [00:00<00:00, 24.15it/s] 0.016 rtbest='bac,bd->ad': 12%|█▏ | 3/25 [00:00<00:00, 24.15it/s] 0.015 rtbest='bad,bc->ac': 12%|█▏ | 3/25 [00:00<00:00, 24.15it/s] 0.015 rtbest='bad,bc->ac': 24%|██▍ | 6/25 [00:00<00:00, 22.47it/s] 0.015 rtbest='dba,dc->bc': 24%|██▍ | 6/25 [00:00<00:00, 22.47it/s] 0.015 rtbest='dba,dc->bc': 36%|███▌ | 9/25 [00:00<00:00, 22.07it/s] 0.015 rtbest='dba,dc->bc': 48%|████▊ | 12/25 [00:00<00:00, 21.77it/s] 0.015 rtbest='cda,cb->db': 48%|████▊ | 12/25 [00:00<00:00, 21.77it/s] 0.015 rtbest='cda,cb->db': 60%|██████ | 15/25 [00:00<00:00, 23.01it/s] 0.015 rtbest='adb,ac->dc': 60%|██████ | 15/25 [00:00<00:00, 23.01it/s] 0.015 rtbest='acd,ab->cb': 60%|██████ | 15/25 [00:00<00:00, 23.01it/s] 0.015 rtbest='acd,ab->cb': 72%|███████▏ | 18/25 [00:00<00:00, 22.38it/s] 0.015 rtbest='acd,ab->cb': 84%|████████▍ | 21/25 [00:00<00:00, 22.46it/s] 0.015 rtbest='acd,ab->cb': 96%|█████████▌| 24/25 [00:01<00:00, 22.06it/s] 0.015 rtbest='acd,ab->cb': 100%|██████████| 25/25 [00:01<00:00, 22.42it/s]
The last example shows the time taken by every function:
<<<
import os from pyquickhelper.pycode.profiling import profile import numpy from mlprodict.testing.einsum import einsum from mlprodict.testing.einsum.einsum_fct import enumerate_cached_einsum from mlprodict import __file__ as path root = os.path.dirname(path) equation = "cab,cd->ad" m1 = numpy.random.randn(200, 20, 20) m2 = numpy.random.randn(200, 20) def clean(txt): txt = txt.replace(root, "mlprodict") return "\n".join(txt.split("\n")[:30]) def fct1(): for i in range(100): einsum(equation, m1, m2, cache=False) print("Profile cache with default runtime.") res = profile(fct1) print(root) print(clean(res[1])) def fct2(): for i in range(100): einsum(equation, m1, m2, cache=False, runtime='python') print("Profile cache with runtime='python'.") res = profile(fct2) print(root) print(clean(res[1])) def fct3(): for i in range(100): einsum(equation, m1, m2, cache=True) einsum(equation, m1, m2, cache=True) print("Profile execution with default runtime.") res = profile(fct3) print(root) print(clean(res[1])) def fct4(): for i in range(100): einsum(equation, m1, m2, cache=True, runtime='python') einsum(equation, m1, m2, cache=True, runtime='python') print("Profile execution with runtime='python'.") res = profile(fct4) print(root) print(clean(res[1])) def fct5(): for i in range(100): einsum(equation, m1, m2, cache=True, runtime='onnxruntime1') einsum(equation, m1, m2, cache=True, runtime='onnxruntime1') print("Profile execution with runtime='onnxruntime1'.") res = profile(fct5) print(root) print(clean(res[1]))
>>>
Profile cache with default runtime. /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_doc/sphinxdoc/source/mlprodict 133202 function calls (133002 primitive calls) in 0.517 seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.001 0.001 0.517 0.517 <stdin>:27(fct1) 100 0.002 0.000 0.516 0.005 mlprodict/testing/einsum/einsum_fct.py:457(einsum) 100 0.000 0.000 0.366 0.004 mlprodict/testing/einsum/einsum_fct.py:379(optimize_decompose_einsum_equation) 100 0.000 0.000 0.366 0.004 mlprodict/testing/einsum/einsum_fct.py:357(_einsum) 100 0.001 0.000 0.365 0.004 mlprodict/testing/einsum/einsum_fct.py:339(build_einsum) 100 0.001 0.000 0.364 0.004 mlprodict/testing/einsum/einsum_fct.py:109(build) 100 0.001 0.000 0.363 0.004 mlprodict/testing/einsum/einsum_fct.py:275(build_runtime) 100 0.003 0.000 0.362 0.004 mlprodict/testing/einsum/einsum_impl.py:85(decompose_einsum_equation) 100 0.045 0.000 0.315 0.003 mlprodict/testing/einsum/einsum_impl.py:411(_decompose_einsum_equation_simple) 100 0.000 0.000 0.148 0.001 mlprodict/testing/einsum/einsum_fct.py:327(__call__) 100 0.001 0.000 0.148 0.001 mlprodict/testing/einsum/einsum_fct.py:287(<lambda>) 100 0.001 0.000 0.147 0.001 mlprodict/testing/einsum/einsum_impl.py:165(apply_einsum_sequence) 100 0.007 0.000 0.146 0.001 mlprodict/testing/einsum/einsum_impl_classes.py:1206(apply_sequence) 1200 0.008 0.000 0.139 0.000 mlprodict/testing/einsum/einsum_impl_classes.py:601(apply) 1200 0.017 0.000 0.124 0.000 mlprodict/testing/einsum/einsum_impl_classes.py:329(compute_output_row) 1600 0.007 0.000 0.074 0.000 {built-in method numpy.core._multiarray_umath.implement_array_function} 4800 0.015 0.000 0.068 0.000 mlprodict/testing/einsum/einsum_impl_classes.py:22(single_axes) 1900 0.059 0.000 0.059 0.000 {method 'reduce' of 'numpy.ufunc' objects} 3800 0.053 0.000 0.053 0.000 mlprodict/testing/einsum/einsum_impl_classes.py:38(<listcomp>) 100 0.008 0.000 0.053 0.001 mlprodict/testing/einsum/einsum_impl_classes.py:496(_apply_batch_dot) 500 0.006 0.000 0.047 0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py:69(_wrapreduction) 500 0.030 0.000 0.041 0.000 mlprodict/testing/einsum/einsum_impl.py:227(_apply_transpose_reshape) 100 0.001 0.000 0.034 0.000 mlprodict/testing/einsum/einsum_impl_classes.py:563(_apply_reduce_sum) 100 0.001 0.000 0.031 0.000 <__array_function__ internals>:177(sum) 100 0.001 0.000 0.030 0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py:2162(sum) Profile cache with runtime='python'. /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_doc/sphinxdoc/source/mlprodict 1025199 function calls (1014148 primitive calls) in 3.956 seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.001 0.001 3.969 3.969 <stdin>:36(fct2) 100 0.002 0.000 3.968 0.040 mlprodict/testing/einsum/einsum_fct.py:457(einsum) 100 0.000 0.000 3.716 0.037 mlprodict/testing/einsum/einsum_fct.py:379(optimize_decompose_einsum_equation) 100 0.001 0.000 3.716 0.037 mlprodict/testing/einsum/einsum_fct.py:357(_einsum) 100 0.001 0.000 3.715 0.037 mlprodict/testing/einsum/einsum_fct.py:339(build_einsum) 100 0.001 0.000 3.714 0.037 mlprodict/testing/einsum/einsum_fct.py:109(build) 100 0.018 0.000 3.713 0.037 mlprodict/testing/einsum/einsum_fct.py:275(build_runtime) 100 0.003 0.000 2.458 0.025 mlprodict/onnxrt/onnx_inference.py:101(__init__) 100 0.064 0.001 2.455 0.025 mlprodict/onnxrt/onnx_inference.py:178(_init) 2800 0.050 0.000 1.498 0.001 mlprodict/onnxrt/onnx_inference_node.py:165(setup_runtime) 2800 0.040 0.000 1.417 0.001 mlprodict/onnxrt/ops.py:9(load_op) 391/1 0.006 0.000 1.047 1.047 <frozen importlib._bootstrap>:1002(_find_and_load) 391/1 0.005 0.000 1.047 1.047 <frozen importlib._bootstrap>:967(_find_and_load_unlocked) 391/1 0.005 0.000 1.046 1.046 <frozen importlib._bootstrap>:659(_load_unlocked) 374/1 0.003 0.000 1.046 1.046 <frozen importlib._bootstrap_external>:784(exec_module) 410/1 0.001 0.000 1.046 1.046 <frozen importlib._bootstrap>:220(_call_with_frames_removed) 375/1 0.002 0.000 1.046 1.046 {built-in method builtins.exec} 1 0.000 0.000 1.046 1.046 mlprodict/onnxrt/ops_cpu/__init__.py:2(<module>) 100 0.029 0.000 0.870 0.009 mlprodict/testing/einsum/einsum_impl_classes.py:1464(to_onnx) 1 0.006 0.006 0.831 0.831 mlprodict/onnxrt/ops_cpu/_op_list.py:3(<module>) 100 0.164 0.002 0.594 0.006 mlprodict/onnxrt/onnx_inference.py:524(to_sequence) 11036/9867 0.037 0.000 0.468 0.000 {method 'join' of 'str' objects} 182 0.007 0.000 0.445 0.002 mlprodict/onnxrt/doc/doc_helper.py:152(get_rst_doc) 182 0.003 0.000 0.436 0.002 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/jinja2/environment.py:1256(render) 15947 0.053 0.000 0.403 0.000 <template>:5(root) Profile execution with default runtime. /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_doc/sphinxdoc/source/mlprodict 35402 function calls in 0.150 seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.001 0.001 0.150 0.150 <stdin>:46(fct3) 100 0.002 0.000 0.149 0.001 mlprodict/testing/einsum/einsum_fct.py:457(einsum) 100 0.000 0.000 0.146 0.001 mlprodict/testing/einsum/einsum_fct.py:327(__call__) 100 0.001 0.000 0.146 0.001 mlprodict/testing/einsum/einsum_fct.py:287(<lambda>) 100 0.001 0.000 0.145 0.001 mlprodict/testing/einsum/einsum_impl.py:165(apply_einsum_sequence) 100 0.007 0.000 0.144 0.001 mlprodict/testing/einsum/einsum_impl_classes.py:1206(apply_sequence) 1200 0.007 0.000 0.136 0.000 mlprodict/testing/einsum/einsum_impl_classes.py:601(apply) 1400 0.005 0.000 0.072 0.000 {built-in method numpy.core._multiarray_umath.implement_array_function} 100 0.008 0.000 0.052 0.001 mlprodict/testing/einsum/einsum_impl_classes.py:496(_apply_batch_dot) 500 0.006 0.000 0.048 0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py:69(_wrapreduction) 500 0.039 0.000 0.039 0.000 {method 'reduce' of 'numpy.ufunc' objects} 100 0.001 0.000 0.035 0.000 mlprodict/testing/einsum/einsum_impl_classes.py:563(_apply_reduce_sum) 100 0.001 0.000 0.032 0.000 <__array_function__ internals>:177(sum) 100 0.001 0.000 0.031 0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py:2162(sum) 400 0.001 0.000 0.022 0.000 <__array_function__ internals>:177(prod) 400 0.002 0.000 0.020 0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py:2927(prod) 200 0.003 0.000 0.019 0.000 mlprodict/testing/einsum/einsum_impl_classes.py:411(_apply_expand_dims) 100 0.013 0.000 0.015 0.000 mlprodict/testing/einsum/blas_lapack.py:93(gemm_dot) 300 0.001 0.000 0.014 0.000 <__array_function__ internals>:177(expand_dims) 400 0.004 0.000 0.014 0.000 mlprodict/testing/einsum/einsum_impl_classes.py:423(_apply_transpose) 300 0.005 0.000 0.012 0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/lib/shape_base.py:512(expand_dims) 1300 0.003 0.000 0.006 0.000 mlprodict/testing/einsum/einsum_impl_classes.py:374(_get_data) 400 0.001 0.000 0.006 0.000 <__array_function__ internals>:177(transpose) 2000 0.005 0.000 0.005 0.000 {built-in method builtins.getattr} 100 0.001 0.000 0.005 0.000 mlprodict/testing/einsum/einsum_impl_classes.py:588(_apply_squeeze) Profile execution with runtime='python'. /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_doc/sphinxdoc/source/mlprodict 34102 function calls in 0.241 seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.001 0.001 0.241 0.241 <stdin>:58(fct4) 100 0.002 0.000 0.240 0.002 mlprodict/testing/einsum/einsum_fct.py:457(einsum) 100 0.000 0.000 0.237 0.002 mlprodict/testing/einsum/einsum_fct.py:327(__call__) 100 0.001 0.000 0.237 0.002 mlprodict/testing/einsum/einsum_fct.py:303(<lambda>) 100 0.001 0.000 0.235 0.002 mlprodict/onnxrt/onnx_inference.py:797(run) 100 0.002 0.000 0.234 0.002 mlprodict/onnxrt/onnx_inference.py:299(_run_sequence_runtime_compiled) 100 0.012 0.000 0.233 0.002 <string>:1(compiled_run) 2100 0.022 0.000 0.086 0.000 {built-in method numpy.core._multiarray_umath.implement_array_function} 600 0.021 0.000 0.046 0.000 mlprodict/onnxrt/ops_cpu/op_gather.py:28(_run) 100 0.002 0.000 0.036 0.000 mlprodict/onnxrt/ops_cpu/op_reduce_sum.py:64(_run) 100 0.001 0.000 0.033 0.000 <__array_function__ internals>:177(sum) 100 0.001 0.000 0.032 0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py:2162(sum) 100 0.001 0.000 0.031 0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/core/fromnumeric.py:69(_wrapreduction) 300 0.001 0.000 0.030 0.000 mlprodict/onnxrt/ops_cpu/op_reshape.py:37(_run) 400 0.003 0.000 0.030 0.000 mlprodict/onnxrt/ops_cpu/op_identity.py:17(_run) 100 0.029 0.000 0.029 0.000 {method 'reduce' of 'numpy.ufunc' objects} 300 0.010 0.000 0.029 0.000 mlprodict/onnxrt/ops_cpu/op_reshape.py:14(reshape_reference_implementation) 400 0.027 0.000 0.027 0.000 {method 'copy' of 'numpy.ndarray' objects} 600 0.006 0.000 0.025 0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/core/_dtype.py:34(__str__) 200 0.008 0.000 0.025 0.000 mlprodict/onnxrt/ops_cpu/op_unsqueeze.py:54(_run) 600 0.006 0.000 0.018 0.000 /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_venv/lib/python3.9/site-packages/numpy/core/_dtype.py:344(_name_get) 200 0.002 0.000 0.016 0.000 <__array_function__ internals>:177(expand_dims) 400 0.003 0.000 0.015 0.000 mlprodict/onnxrt/ops_cpu/op_transpose.py:23(_run) 100 0.000 0.000 0.015 0.000 mlprodict/onnxrt/ops_cpu/op_gemm.py:57(_run) 100 0.000 0.000 0.014 0.000 mlprodict/onnxrt/ops_cpu/op_gemm.py:27(<lambda>) Profile execution with runtime='onnxruntime1'. /var/lib/jenkins/workspace/mlprodict/mlprodict_UT_39_std/_doc/sphinxdoc/source/mlprodict 2202 function calls in 0.242 seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.001 0.001 0.242 0.242 <stdin>:69(fct5) 100 0.002 0.000 0.241 0.002 mlprodict/testing/einsum/einsum_fct.py:457(einsum) 100 0.000 0.000 0.238 0.002 mlprodict/testing/einsum/einsum_fct.py:327(__call__) 100 0.001 0.000 0.237 0.002 mlprodict/testing/einsum/einsum_fct.py:303(<lambda>) 100 0.001 0.000 0.236 0.002 mlprodict/onnxrt/onnx_inference.py:797(run) 100 0.002 0.000 0.235 0.002 mlprodict/onnxrt/onnx_inference.py:1329(_run_whole_runtime) 100 0.233 0.002 0.233 0.002 mlprodict/onnxrt/ops_whole/session.py:97(run) 100 0.000 0.000 0.001 0.000 mlprodict/testing/einsum/einsum_fct.py:379(optimize_decompose_einsum_equation) 100 0.000 0.000 0.001 0.000 mlprodict/testing/einsum/einsum_fct.py:357(_einsum) 100 0.000 0.000 0.000 0.000 mlprodict/onnxrt/onnx_inference.py:1402(<dictcomp>) 300 0.000 0.000 0.000 0.000 mlprodict/testing/einsum/einsum_fct.py:655(<genexpr>) 100 0.000 0.000 0.000 0.000 {method 'get' of 'dict' objects} 100 0.000 0.000 0.000 0.000 mlprodict/testing/einsum/einsum_fct.py:304(<dictcomp>) 200 0.000 0.000 0.000 0.000 {built-in method builtins.hasattr} 200 0.000 0.000 0.000 0.000 {built-in method builtins.len} 100 0.000 0.000 0.000 0.000 {built-in method builtins.isinstance} 100 0.000 0.000 0.000 0.000 {method 'values' of 'dict' objects} 100 0.000 0.000 0.000 0.000 {built-in method builtins.iter} 100 0.000 0.000 0.000 0.000 {built-in method builtins.next} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
- mlprodict.testing.einsum.einsum_fct.enumerate_cached_einsum()#
Enumerates all cached einsum function.
- mlprodict.testing.einsum.einsum_fct.optimize_decompose_einsum_equation(equation, dtype, optimize=False, runtime='batch_dot', cache=True, opset=None, decompose=True, strategy=None, verbose=None)#
Proposes a new implementation of numpy.einsum. It does not allow expresion using … and expects a right member.
- Parameters:
equation – einsum equation
optimize – permutes all letters to find the best permutation
runtime – runtime used to compute the results once the computation graph is produced (see below)
cache – if True, the function stores the preprocessing done for a specific equation, the second call with the same equation is much faster
opset – ONNX opset to use for some runtimes
decompose – by default, the function decomposes the equation into more simple operators but it can keep the original ONNX einsum operator.
strategy – optimisation strategy (see below)
verbose – display progress if optimize is True
- Returns:
einsum result
The available runtimes are:
batch_dot: the runtime is
apply_einsum_sequence
,python: one ONNX graph executed with a python runtime,
onnxruntime1: one ONNX graph executed with onnxruntime.
The optimisation strategy can be:
None: the same runtime is used to find the best permutation of letters
- ‘ml’: a machine learned model is used to predict the
best permutation of letters, this model comes from notebook Infer operator computation cost.
The function works in two steps:
first step analyses the equation to produce a computation graph, this graph can also be converted into ONNX,
second step runs the graph whatever the graph is.
The function returns an object of type
CachedEinsum
which has the following members after optimization:equation_ corresponding to the best equivalent equation
- graph_: the corresponding graph returned by function
:func:`decompose_einsum_equation <mlprodict.testing.einsum.einsum_impl.decompose_einsum_equation> `
onnx_: if a conversion to onnx is used, stores the onnx graph
runtime_: a function used by __call__, calls the runtime
oinf_: an object of type
OnnxInference
timed_permutations_: memorizes the results of the optimization
<<<
import numpy from mlprodict.testing.einsum import optimize_decompose_einsum_equation seq_opt = optimize_decompose_einsum_equation( "bsnh,btnh->bnts", numpy.float64, strategy='ml', verbose=1, runtime="python", optimize=True) print("best equation:", seq_opt.equation_)
>>>
0%| | 0/121 [00:00<?, ?it/s] 4.5 mlbest='bsnh,btnh->bnts': 0%| | 0/121 [00:00<?, ?it/s] 4.5 mlbest='bsnh,btnh->bnts': 3%|3 | 4/121 [00:00<00:03, 35.24it/s] 4.5 mlbest='bnth,bsth->btsn': 3%|3 | 4/121 [00:00<00:03, 35.24it/s] 4.5 mlbest='bnth,bsth->btsn': 7%|6 | 8/121 [00:00<00:03, 35.28it/s] 4.5 mlbest='bnht,bsht->bhsn': 7%|6 | 8/121 [00:00<00:03, 35.28it/s] 4.5 mlbest='bnht,bsht->bhsn': 10%|9 | 12/121 [00:00<00:02, 36.34it/s] 4.5 mlbest='bhtn,bstn->btsh': 10%|9 | 12/121 [00:00<00:02, 36.34it/s] 4.5 mlbest='bhtn,bstn->btsh': 13%|#3 | 16/121 [00:00<00:02, 36.81it/s] 4.5 mlbest='bhts,bnts->btnh': 13%|#3 | 16/121 [00:00<00:02, 36.81it/s] 4.5 mlbest='bhts,bnts->btnh': 17%|#6 | 20/121 [00:00<00:02, 36.95it/s] 4.5 mlbest='bhts,bnts->btnh': 20%|#9 | 24/121 [00:00<00:02, 36.32it/s] 4.5 mlbest='bhts,bnts->btnh': 23%|##3 | 28/121 [00:00<00:02, 36.79it/s] 4.5 mlbest='bhts,bnts->btnh': 26%|##6 | 32/121 [00:00<00:02, 36.98it/s] 4.5 mlbest='bhts,bnts->btnh': 30%|##9 | 36/121 [00:00<00:02, 36.35it/s] 4.5 mlbest='bhts,bnts->btnh': 33%|###3 | 40/121 [00:01<00:02, 36.65it/s] 4.5 mlbest='bhts,bnts->btnh': 36%|###6 | 44/121 [00:01<00:02, 36.89it/s] 4.5 mlbest='bhts,bnts->btnh': 40%|###9 | 48/121 [00:01<00:01, 37.11it/s] 4.5 mlbest='bhts,bnts->btnh': 43%|####2 | 52/121 [00:01<00:01, 36.42it/s] 4.5 mlbest='bhts,bnts->btnh': 46%|####6 | 56/121 [00:01<00:01, 36.83it/s] 4.5 mlbest='bhts,bnts->btnh': 50%|####9 | 60/121 [00:01<00:01, 36.98it/s] 4.5 mlbest='bhts,bnts->btnh': 53%|#####2 | 64/121 [00:01<00:01, 36.33it/s] 4.5 mlbest='bhts,bnts->btnh': 56%|#####6 | 68/121 [00:01<00:01, 36.63it/s] 4.5 mlbest='bhts,bnts->btnh': 60%|#####9 | 72/121 [00:01<00:01, 36.85it/s] 4.5 mlbest='bhts,bnts->btnh': 63%|######2 | 76/121 [00:02<00:01, 37.06it/s] 4.5 mlbest='bhts,bnts->btnh': 66%|######6 | 80/121 [00:02<00:01, 36.39it/s] 4.5 mlbest='bhts,bnts->btnh': 69%|######9 | 84/121 [00:02<00:01, 36.68it/s] 4.5 mlbest='bhts,bnts->btnh': 73%|#######2 | 88/121 [00:02<00:00, 36.86it/s] 4.5 mlbest='bhts,bnts->btnh': 76%|#######6 | 92/121 [00:02<00:00, 36.25it/s] 4.5 mlbest='bhts,bnts->btnh': 79%|#######9 | 96/121 [00:02<00:00, 36.56it/s] 4.5 mlbest='bhts,bnts->btnh': 83%|########2 | 100/121 [00:02<00:00, 36.79it/s] 4.5 mlbest='bhts,bnts->btnh': 86%|########5 | 104/121 [00:02<00:00, 37.04it/s] 4.5 mlbest='bhts,bnts->btnh': 89%|########9 | 108/121 [00:02<00:00, 36.31it/s] 4.5 mlbest='bhts,bnts->btnh': 93%|#########2| 112/121 [00:03<00:00, 36.61it/s] 4.5 mlbest='bhts,bnts->btnh': 96%|#########5| 116/121 [00:03<00:00, 36.80it/s] 4.5 mlbest='bhts,bnts->btnh': 99%|#########9| 120/121 [00:03<00:00, 36.15it/s] 4.5 mlbest='bhts,bnts->btnh': 100%|##########| 121/121 [00:03<00:00, 36.59it/s] best equation: bhts,bnts->btnh