{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# ONNX visualization\n", "\n", "[ONNX](https://onnx.ai/) is a serialization format for machine learned model. It is a list of mathematical functions used to describe every prediction function for standard and deep machine learning. Module [onnx](https://github.com/onnx/onnx) offers some tools to [display ONNX graph](http://www.xavierdupre.fr/app/sklearn-onnx/helpsphinx/auto_examples/plot_pipeline.html). [Netron](https://github.com/lutzroeder/netron) is another approach. The following notebooks explore a ligher visualization."]}, {"cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [{"data": {"text/html": ["
run previous cell, wait for 2 seconds
\n", ""], "text/plain": [""]}, "execution_count": 2, "metadata": {}, "output_type": "execute_result"}], "source": ["from jyquickhelper import add_notebook_menu\n", "add_notebook_menu()"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Train a model"]}, {"cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [{"data": {"text/plain": ["LogisticRegression(solver='liblinear')"]}, "execution_count": 3, "metadata": {}, "output_type": "execute_result"}], "source": ["from sklearn.datasets import load_iris\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.linear_model import LogisticRegression\n", "iris = load_iris()\n", "X, y = iris.data, iris.target\n", "X_train, X_test, y_train, y_test = train_test_split(X, y)\n", "clr = LogisticRegression(solver='liblinear')\n", "clr.fit(X_train, y_train)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Convert a model"]}, {"cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": ["import numpy\n", "from mlprodict.onnx_conv import to_onnx\n", "model_onnx = to_onnx(clr, X_train.astype(numpy.float32))"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Explore it with OnnxInference"]}, {"cell_type": "code", "execution_count": 4, "metadata": {"scrolled": false}, "outputs": [{"data": {"text/plain": ["OnnxInference(...)"]}, "execution_count": 5, "metadata": {}, "output_type": "execute_result"}], "source": ["from mlprodict.onnxrt import OnnxInference\n", "\n", "sess = OnnxInference(model_onnx)\n", "sess"]}, {"cell_type": "code", "execution_count": 5, "metadata": {"scrolled": false}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["OnnxInference(...)\n", " ir_version: 4\n", " producer_name: \"skl2onnx\"\n", " producer_version: \"1.7.1076\"\n", " domain: \"ai.onnx\"\n", " model_version: 0\n", " doc_string: \"\"\n", " graph {\n", " node {\n", " input: \"X\"\n", " output: \"label\"\n", " output: \"probability_tensor\"\n", " name: \"LinearClassifier\"\n", " op_type: \"LinearClassifier\"\n", " attribute {\n", " name: \"classlabels_ints\"\n", " ints: 0\n", " ints: 1\n", " ints: 2\n", " type: INTS\n", " }\n", " attribute {\n", " name: \"coefficients\"\n", " floats: 0.3895888328552246\n", " floats: 1.3643852472305298\n", " floats: -2.140394449234009\n", " floats: -0.9475928544998169\n", " floats: 0.3562876284122467\n", " floats: -1.4181873798370361\n", " floats: 0.5958272218704224\n", " floats: -1.3317818641662598\n", " floats: -1.5090725421905518\n", " floats: -1.3937636613845825\n", " floats: 2.168299436569214\n", " floats: 2.3770956993103027\n", " type: FLOATS\n", " }\n", " attribute {\n", " name: \"intercepts\"\n", " floats: 0.23760676383972168\n", " floats: 0.8039277791976929\n", " floats: -1.0647538900375366\n", " type: FLOATS\n", " }\n", " attribute {\n", " name: \"multi_class\"\n", " i: 1\n", " type: INT\n", " }\n", " attribute {\n", " name: \"post_transform\"\n", " s: \"LOGISTIC\"\n", " type: STRING\n", " }\n", " domain: \"ai.onnx.ml\"\n", " }\n", " node {\n", " input: \"probability_tensor\"\n", " output: \"probabilities\"\n", " name: \"Normalizer\"\n", " op_type: \"Normalizer\"\n", " attribute {\n", " name: \"norm\"\n", " s: \"L1\"\n", " type: STRING\n", " }\n", " domain: \"ai.onnx.ml\"\n", " }\n", " node {\n", " input: \"label\"\n", " output: \"output_label\"\n", " name: \"Cast\"\n", " op_type: \"Cast\"\n", " attribute {\n", " name: \"to\"\n", " i: 7\n", " type: INT\n", " }\n", " domain: \"\"\n", " }\n", " node {\n", " input: \"probabilities\"\n", " output: \"output_probability\"\n", " name: \"ZipMap\"\n", " op_type: \"ZipMap\"\n", " attribute {\n", " name: \"classlabels_int64s\"\n", " ints: 0\n", " ints: 1\n", " ints: 2\n", " type: INTS\n", " }\n", " domain: \"ai.onnx.ml\"\n", " }\n", " name: \"mlprodict_ONNX(LogisticRegression)\"\n", " input {\n", " name: \"X\"\n", " type {\n", " tensor_type {\n", " elem_type: 1\n", " shape {\n", " dim {\n", " }\n", " dim {\n", " dim_value: 4\n", " }\n", " }\n", " }\n", " }\n", " }\n", " output {\n", " name: \"output_label\"\n", " type {\n", " tensor_type {\n", " elem_type: 7\n", " shape {\n", " dim {\n", " }\n", " }\n", " }\n", " }\n", " }\n", " output {\n", " name: \"output_probability\"\n", " type {\n", " sequence_type {\n", " elem_type {\n", " map_type {\n", " key_type: 7\n", " value_type {\n", " tensor_type {\n", " elem_type: 1\n", " }\n", " }\n", " }\n", " }\n", " }\n", " }\n", " }\n", " }\n", " opset_import {\n", " domain: \"ai.onnx.ml\"\n", " version: 1\n", " }\n", " opset_import {\n", " domain: \"\"\n", " version: 9\n", " }\n", "\n"]}], "source": ["print(sess)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## dot"]}, {"cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["digraph{\n", " ranksep=0.25;\n", " nodesep=0.05;\n", " orientation=portrait;\n", "\n", " X [shape=box color=red label=\"X\\nfloat((0, 4))\" fontsize=10];\n", "\n", " output_label [shape=box color=green label=\"output_label\\nint64((0,))\" fontsize=10];\n", " output_probability [shape=box color=green label=\"output_probability\\n[{int64, {'kind': 'tensor', 'elem': 'float', 'shape': }}]\" fontsize=10];\n", "\n", "\n", " label [shape=box label=\"label\" fontsize=10];\n", " probability_tensor [shape=box label=\"probability_tensor\" fontsize=10];\n", " LinearClassifier [shape=box style=\"filled,rounded\" color=orange label=\"LinearClassifier\\n(LinearClassifier)\\nclasslabels_ints=[0 1 2]\\ncoefficients=[ 0.38958883 1.36...\\nintercepts=[ 0.23760676 0.8039...\\nmulti_class=1\\npost_transform=b'LOGISTIC'\" fontsize=10];\n", " X -> LinearClassifier;\n", " LinearClassifier -> label;\n", " LinearClassifier -> probability_tensor;\n", "\n", " probabilities [shape=box label=\"probabilities\" fontsize=10];\n", " Normalizer [shape=box style=\"filled,rounded\" color=orange label=\"Normalizer\\n(Normalizer)\\nnorm=b'L1'\" fontsize=10];\n", " probability_tensor -> Normalizer;\n", " Normalizer -> probabilities;\n", "\n", " Cast [shape=box style=\"filled,rounded\" color=orange label=\"Cast\\n(Cast)\\nto=7\" fontsize=10];\n", " label -> Cast;\n", " Cast -> output_label;\n", "\n", " ZipMap [shape=box style=\"filled,rounded\" color=orange label=\"ZipMap\\n(ZipMap)\\nclasslabels_int64s=[0 1 2]\" fontsize=10];\n", " probabilities -> ZipMap;\n", " ZipMap -> output_probability;\n", "}\n"]}], "source": ["dot = sess.to_dot()\n", "print(dot)"]}, {"cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [{"data": {"text/html": ["
\n", ""], "text/plain": [""]}, "execution_count": 8, "metadata": {}, "output_type": "execute_result"}], "source": ["from jyquickhelper import RenderJsDot\n", "RenderJsDot(dot) # add local=True if nothing shows up"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## magic commands\n", "\n", "The module implements a magic command to easily display graphs."]}, {"cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["The mlprodict extension is already loaded. To reload it, use:\n", " %reload_ext mlprodict\n"]}], "source": ["%load_ext mlprodict"]}, {"cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [{"data": {"text/html": ["
\n", ""], "text/plain": [""]}, "execution_count": 10, "metadata": {}, "output_type": "execute_result"}], "source": ["# add -l 1 if nothing shows up\n", "%onnxview model_onnx"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Shape information\n", "\n", "It is possible to use the python runtime to get an estimation of each node shape."]}, {"cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [{"data": {"text/html": ["
\n", ""], "text/plain": [""]}, "execution_count": 11, "metadata": {}, "output_type": "execute_result"}], "source": ["%onnxview model_onnx -a 1"]}, {"cell_type": "markdown", "metadata": {}, "source": ["The shape ``(n, 2)`` means a matrix with an indefinite number of rows and 2 columns."]}, {"cell_type": "markdown", "metadata": {}, "source": ["## runtime\n", "\n", "Let's compute the prediction using a Python runtime."]}, {"cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [{"data": {"text/plain": ["{0: array([0.84339281, 0.01372288, 0.77424892, 0.00095374, 0.04052374]),\n", " 1: array([0.15649399, 0.71819778, 0.22563196, 0.25979154, 0.7736001 ]),\n", " 2: array([1.13198419e-04, 2.68079336e-01, 1.19117272e-04, 7.39254721e-01,\n", " 1.85876160e-01])}"]}, "execution_count": 12, "metadata": {}, "output_type": "execute_result"}], "source": ["prob = sess.run({'X': X_test})['output_probability']\n", "prob[:5]"]}, {"cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [{"data": {"text/plain": ["array([[8.43392810e-01, 1.56493992e-01, 1.13198419e-04],\n", " [1.37228844e-02, 7.18197780e-01, 2.68079336e-01],\n", " [7.74248918e-01, 2.25631964e-01, 1.19117272e-04],\n", " [9.53737402e-04, 2.59791542e-01, 7.39254721e-01],\n", " [4.05237433e-02, 7.73600097e-01, 1.85876160e-01]])"]}, "execution_count": 13, "metadata": {}, "output_type": "execute_result"}], "source": ["import pandas\n", "prob = pandas.DataFrame(list(prob)).values\n", "prob[:5]"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Which we compare to the original model."]}, {"cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [{"data": {"text/plain": ["array([[8.43392800e-01, 1.56494002e-01, 1.13198441e-04],\n", " [1.37228764e-02, 7.18197725e-01, 2.68079398e-01],\n", " [7.74248907e-01, 2.25631976e-01, 1.19117296e-04],\n", " [9.53736800e-04, 2.59791543e-01, 7.39254720e-01],\n", " [4.05237263e-02, 7.73600070e-01, 1.85876204e-01]])"]}, "execution_count": 14, "metadata": {}, "output_type": "execute_result"}], "source": ["clr.predict_proba(X_test)[:5]"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Some time measurement..."]}, {"cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["86.7 \u00b5s \u00b1 7.33 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 10000 loops each)\n"]}], "source": ["%timeit clr.predict_proba(X_test)"]}, {"cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["52.5 \u00b5s \u00b1 4.53 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 10000 loops each)\n"]}], "source": ["%timeit sess.run({'X': X_test})['output_probability']"]}, {"cell_type": "markdown", "metadata": {}, "source": ["With one observation:"]}, {"cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["77.6 \u00b5s \u00b1 4.07 \u00b5s per loop (mean \u00b1 std. dev. of 7 runs, 10000 loops each)\n"]}], "source": ["%timeit clr.predict_proba(X_test[:1])"]}, {"cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["40.6 \u00b5s \u00b1 913 ns per loop (mean \u00b1 std. dev. of 7 runs, 10000 loops each)\n"]}], "source": ["%timeit sess.run({'X': X_test[:1]})['output_probability']"]}, {"cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": ["%matplotlib inline"]}, {"cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [{"data": {"image/png": "\n", "text/plain": ["
"]}, "metadata": {"needs_background": "light"}, "output_type": "display_data"}], "source": ["from pyquickhelper.pycode.profiling import profile\n", "pr, df = profile(lambda: sess.run({'X': X_test})['output_probability'], as_df=True)\n", "ax = df[['namefct', 'cum_tall']].head(n=20).set_index('namefct').plot(kind='bar', figsize=(12, 3), rot=30)\n", "ax.set_title(\"example of a graph\")\n", "for la in ax.get_xticklabels():\n", " la.set_horizontalalignment('right');"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Add metadata\n", "\n", "It is possible to add metadata once the model is converted."]}, {"cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": ["meta = model_onnx.metadata_props.add()\n", "meta.key = \"key_meta\"\n", "meta.value = \"value_meta\""]}, {"cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [{"data": {"text/plain": ["[key: \"key_meta\"\n", " value: \"value_meta\"]"]}, "execution_count": 22, "metadata": {}, "output_type": "execute_result"}], "source": ["list(model_onnx.metadata_props)"]}, {"cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [{"data": {"text/plain": ["key: \"key_meta\"\n", "value: \"value_meta\""]}, "execution_count": 23, "metadata": {}, "output_type": "execute_result"}], "source": ["model_onnx.metadata_props[0]"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## Simple PCA"]}, {"cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [{"data": {"text/plain": ["PCA(n_components=2)"]}, "execution_count": 24, "metadata": {}, "output_type": "execute_result"}], "source": ["from sklearn.decomposition import PCA\n", "model = PCA(n_components=2)\n", "model.fit(X)"]}, {"cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": ["pca_onnx = to_onnx(model, X.astype(numpy.float32))"]}, {"cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [{"name": "stdout", "output_type": "stream", "text": ["The mlprodict extension is already loaded. To reload it, use:\n", " %reload_ext mlprodict\n"]}], "source": ["%load_ext mlprodict"]}, {"cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [{"data": {"text/html": ["
\n", ""], "text/plain": [""]}, "execution_count": 27, "metadata": {}, "output_type": "execute_result"}], "source": ["%onnxview pca_onnx -a 1"]}, {"cell_type": "markdown", "metadata": {}, "source": ["The graph would probably be faster if the multiplication was done before the subtraction because it is easier to do this one inline than the multiplication."]}, {"cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": []}], "metadata": {"kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.2"}}, "nbformat": 4, "nbformat_minor": 2}