.. _convolutationmatmulrst:

=====================================
Convolution and Matrix Multiplication
=====================================


.. only:: html

    **Links:** :download:`notebook <convolutation_matmul.ipynb>`, :downloadlink:`html <convolutation_matmul2html.html>`, :download:`PDF <convolutation_matmul.pdf>`, :download:`python <convolutation_matmul.py>`, :downloadlink:`slides <convolutation_matmul.slides.html>`, :githublink:`GitHub|_doc/notebooks/convolutation_matmul.ipynb|*`


The
`convolution <https://en.wikipedia.org/wiki/Kernel_(image_processing)>`__
is a well known image transformation used to transform an image. It can
be used to blur, to compute the gradient in one direction and it is
widely used in deep neural networks. Having a fast implementation is
important.

.. code:: ipython3

    from jyquickhelper import add_notebook_menu
    add_notebook_menu()


.. contents::
    :local:


.. code:: ipython3

    %matplotlib inline

.. code:: ipython3

    %load_ext mlprodict

numpy
-----

Image have often 4 dimensions (N, C, H, W) = (batch, channels, height,
width). Let’s first start with a 2D image.

.. code:: ipython3

    import numpy
    
    shape = (5, 7)
    N = numpy.prod(shape)
    data = numpy.arange(N).astype(numpy.float32).reshape(shape)
    # data[:, :] = 0
    # data[2, 3] = 1
    data.shape


.. parsed-literal::
    (5, 7)


Let’s a 2D kernel, the same one.

.. code:: ipython3

    kernel = (numpy.arange(9) + 1).reshape(3, 3).astype(numpy.float32)
    kernel


.. parsed-literal::
    array([[1., 2., 3.],
           [4., 5., 6.],
           [7., 8., 9.]], dtype=float32)


raw convolution
~~~~~~~~~~~~~~~

A raw version of a 2D convolution.

.. code:: ipython3

    def raw_convolution(data, kernel):
        rx = (kernel.shape[0] - 1) // 2
        ry = (kernel.shape[1] - 1) // 2
        res = numpy.zeros(data.shape, dtype=data.dtype)
        for i in range(data.shape[0]):
            for j in range(data.shape[1]):
                for x in range(kernel.shape[0]):
                    for y in range(kernel.shape[1]):
                        a = i + x - rx
                        b = j + y - ry
                        if a < 0 or b < 0 or a >= data.shape[0] or b >= data.shape[1]:
                            continue
                        res[i, j] += kernel[x, y] * data[a, b]
        return res
    
    res = raw_convolution(data, kernel)
    res.shape


.. parsed-literal::
    (5, 7)


.. code:: ipython3

    res


.. parsed-literal::
    array([[ 134.,  211.,  250.,  289.,  328.,  367.,  238.],
           [ 333.,  492.,  537.,  582.,  627.,  672.,  423.],
           [ 564.,  807.,  852.,  897.,  942.,  987.,  612.],
           [ 795., 1122., 1167., 1212., 1257., 1302.,  801.],
           [ 422.,  571.,  592.,  613.,  634.,  655.,  382.]], dtype=float32)


With pytorch
~~~~~~~~~~~~

*pytorch* is optimized for deep learning and prefers 4D tenors to
represent multiple images. We add two empty dimension to the previous
example.

.. code:: ipython3

    from torch import from_numpy
    from torch.nn.functional import conv2d

.. code:: ipython3

    rest = conv2d(from_numpy(data[numpy.newaxis, numpy.newaxis, ...]), 
                  from_numpy(kernel[numpy.newaxis, numpy.newaxis, ...]),
                  padding=(1, 1))
    rest.shape


.. parsed-literal::
    torch.Size([1, 1, 5, 7])


.. code:: ipython3

    rest


.. parsed-literal::
    tensor([[[[ 134.,  211.,  250.,  289.,  328.,  367.,  238.],
              [ 333.,  492.,  537.,  582.,  627.,  672.,  423.],
              [ 564.,  807.,  852.,  897.,  942.,  987.,  612.],
              [ 795., 1122., 1167., 1212., 1257., 1302.,  801.],
              [ 422.,  571.,  592.,  613.,  634.,  655.,  382.]]]])


Everything works.

.. code:: ipython3

    from numpy.testing import assert_almost_equal
    assert_almost_equal(res, rest[0, 0].numpy())

using Gemm?
~~~~~~~~~~~

A fast implementation could reuse whatever exists with a fast
implementation such as a matrix multiplication. The goal is to transform
the tensor ``data`` into a new matrix which can be mutiplied with a
flatten kernel and finally reshaped into the expected result. pytorch
calls this function
`Unfold <https://pytorch.org/docs/stable/generated/torch.nn.Unfold.html>`__.
This function is also called
`im2col <https://caffe.berkeleyvision.org/tutorial/layers/im2col.html>`__.

.. code:: ipython3

    from torch.nn import Unfold
    unfold = Unfold(kernel_size=(3, 3), padding=(1, 1))(from_numpy(data[numpy.newaxis, numpy.newaxis, ...]))
    unfold.shape


.. parsed-literal::
    torch.Size([1, 9, 35])


We then multiply this matrix with the flattened kernel and reshape it.

.. code:: ipython3

    impl = kernel.flatten() @ unfold.numpy()
    impl = impl.reshape(data.shape)
    impl.shape


.. parsed-literal::
    (5, 7)


.. code:: ipython3

    impl


.. parsed-literal::
    array([[ 134.,  211.,  250.,  289.,  328.,  367.,  238.],
           [ 333.,  492.,  537.,  582.,  627.,  672.,  423.],
           [ 564.,  807.,  852.,  897.,  942.,  987.,  612.],
           [ 795., 1122., 1167., 1212., 1257., 1302.,  801.],
           [ 422.,  571.,  592.,  613.,  634.,  655.,  382.]], dtype=float32)


Everything works as expected.

.. code:: ipython3

    assert_almost_equal(res, impl)

What is ConvTranspose?
----------------------

Deep neural network are trained with a stochastic gradient descent. The
gradient of every layer needs to be computed including the gradient of a
convolution transpose. That seems easier with the second expression of a
convolution relying on a matrix multiplication and function ``im2col``.
``im2col`` is just a new matrix built from ``data`` where every value
was copied in 9=3x3 locations. The gradient against an input value
``data[i,j]`` is the sum of 9=3x3 values from the output gradient. If
``im2col`` plays with indices, the gradient requires to do the same
thing in the other way.

.. code:: ipython3

    # impl[:, :] = 0
    # impl[2, 3] = 1
    impl


.. parsed-literal::
    array([[ 134.,  211.,  250.,  289.,  328.,  367.,  238.],
           [ 333.,  492.,  537.,  582.,  627.,  672.,  423.],
           [ 564.,  807.,  852.,  897.,  942.,  987.,  612.],
           [ 795., 1122., 1167., 1212., 1257., 1302.,  801.],
           [ 422.,  571.,  592.,  613.,  634.,  655.,  382.]], dtype=float32)


.. code:: ipython3

    from torch.nn.functional import conv_transpose2d
    
    ct = conv_transpose2d(from_numpy(impl.reshape(data.shape)[numpy.newaxis, numpy.newaxis, ...]),
                          from_numpy(kernel[numpy.newaxis, numpy.newaxis, ...]),
                          padding=(1, 1)).numpy()
    ct


.. parsed-literal::
    array([[[[ 2672.,  5379.,  6804.,  7659.,  8514.,  8403.,  6254.],
             [ 8117., 15408., 18909., 20790., 22671., 21780., 15539.],
             [14868., 27315., 32400., 34425., 36450., 34191., 23922.],
             [20039., 35544., 41283., 43164., 45045., 41508., 28325.],
             [18608., 32055., 36756., 38151., 39546., 35943., 23966.]]]],
          dtype=float32)


And now the version with ``col2im`` or
`Fold <https://pytorch.org/docs/stable/generated/torch.nn.Fold.html#torch.nn.Fold>`__
applied on the result product of the output from ``Conv`` and the
kernel: the output of ``Conv`` is multiplied by every coefficient of the
kernel. Then all these matrices are concatenated to build a matrix of
the same shape of ``unfold``.

.. code:: ipython3

    p = kernel.flatten().reshape((-1, 1)) @ impl.flatten().reshape((1, -1))
    p.shape


.. parsed-literal::
    (9, 35)


.. code:: ipython3

    from torch.nn import Fold
    
    fold = Fold(kernel_size=(3, 3), output_size=(5, 7), padding=(1, 1))(from_numpy(p[numpy.newaxis, ...]))
    fold.shape


.. parsed-literal::
    torch.Size([1, 1, 5, 7])


.. code:: ipython3

    fold


.. parsed-literal::
    tensor([[[[ 2672.,  5379.,  6804.,  7659.,  8514.,  8403.,  6254.],
              [ 8117., 15408., 18909., 20790., 22671., 21780., 15539.],
              [14868., 27315., 32400., 34425., 36450., 34191., 23922.],
              [20039., 35544., 41283., 43164., 45045., 41508., 28325.],
              [18608., 32055., 36756., 38151., 39546., 35943., 23966.]]]])


onnxruntime-training
--------------------

Following lines shows how *onnxruntime* handles the gradient
computation. This section still needs work.

Conv
~~~~

.. code:: ipython3

    from mlprodict.npy.xop import loadop
    OnnxConv = loadop(('', 'Conv'))
    node = OnnxConv('X', kernel[numpy.newaxis, numpy.newaxis, ...], pads=[1, 1, 1, 1])
    onx = node.to_onnx(numpy.float32, numpy.float32)
    %onnxview onx


.. parsed-literal::
    No CUDA runtime is found, using CUDA_HOME='C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.5'


.. raw:: html

    <div id="M6e4b3f5960a7493295700d3e0d745c22-cont"><div id="M6e4b3f5960a7493295700d3e0d745c22" style="width:;height:;"></div></div>
    <script>

    require(['http://www.xavierdupre.fr/js/vizjs/viz.js'], function() { var svgGraph = Viz("digraph{\n  nodesep=0.05;\n  orientation=portrait;\n  size=7;\n  ranksep=0.25;\n\n  X [shape=box color=red label=\"X\nfloat(('?',))\" fontsize=10];\n\n  out_con_0 [shape=box color=green label=\"out_con_0\nfloat(('?',))\" fontsize=10];\n\n  init [shape=box label=\"init\nfloat32((1, 1, 3, 3))\n[[[[1. 2. 3.]\n   [4. 5. 6.]\n   [7. 8. 9.]]]]\" fontsize=10];\n\n  _conv [shape=box style=\"filled,rounded\" color=orange label=\"Conv\n(_conv)\npads=[1 1 1 1]\" fontsize=10];\n  X -> _conv;\n  init -> _conv;\n  _conv -> out_con_0;\n}");
    document.getElementById('M6e4b3f5960a7493295700d3e0d745c22').innerHTML = svgGraph; });

    </script>


.. code:: ipython3

    from mlprodict.onnxrt import OnnxInference
    oinf = OnnxInference(onx, runtime='onnxruntime1')
    oinf.run({'X': data[numpy.newaxis, numpy.newaxis, ...]})['out_con_0']


.. parsed-literal::
    array([[[[ 134.,  211.,  250.,  289.,  328.,  367.,  238.],
             [ 333.,  492.,  537.,  582.,  627.,  672.,  423.],
             [ 564.,  807.,  852.,  897.,  942.,  987.,  612.],
             [ 795., 1122., 1167., 1212., 1257., 1302.,  801.],
             [ 422.,  571.,  592.,  613.,  634.,  655.,  382.]]]],
          dtype=float32)


It is the same.

.. code:: ipython3

    from onnxcustom.training.grad_helper import onnx_derivative, DerivativeOptions
    grad = onnx_derivative(onx, options=DerivativeOptions.FillGrad | DerivativeOptions.KeepOutputs)

.. code:: ipython3

    %onnxview grad


.. raw:: html

    <div id="M5cce4ce785834f039a973abe1c5ea317-cont"><div id="M5cce4ce785834f039a973abe1c5ea317" style="width:;height:;"></div></div>
    <script>

    require(['http://www.xavierdupre.fr/js/vizjs/viz.js'], function() { var svgGraph = Viz("digraph{\n  nodesep=0.05;\n  orientation=portrait;\n  size=7;\n  ranksep=0.25;\n\n  X [shape=box color=red label=\"X\nfloat(('?',))\" fontsize=10];\n  init [shape=box color=red label=\"init\nfloat((1, 1, 3, 3))\" fontsize=10];\n\n  X_grad [shape=box color=green label=\"X_grad\nfloat(('?',))\" fontsize=10];\n  init_grad [shape=box color=green label=\"init_grad\nfloat((1, 1, 3, 3))\" fontsize=10];\n  out_con_0 [shape=box color=green label=\"out_con_0\nfloat(('?',))\" fontsize=10];\n\n\n  _conv [shape=box style=\"filled,rounded\" color=orange label=\"Conv\n(_conv)\nauto_pad=b'NOTSET'\ngroup=1\npads=[1 1 1 1]\" fontsize=10];\n  X -> _conv;\n  init -> _conv;\n  _conv -> out_con_0;\n\n  out_con_0_shape [shape=box label=\"out_con_0_shape\" fontsize=10];\n  Shape [shape=box style=\"filled,rounded\" color=orange label=\"Shape\n(Shape)\" fontsize=10];\n  out_con_0 -> Shape;\n  Shape -> out_con_0_shape;\n\n  out_con_0_grad [shape=box label=\"out_con_0_grad\" fontsize=10];\n  ConstantOfShape [shape=box style=\"filled,rounded\" color=orange label=\"ConstantOfShape\n(ConstantOfShape)\nvalue=[1.]\" fontsize=10];\n  out_con_0_shape -> ConstantOfShape;\n  ConstantOfShape -> out_con_0_grad;\n\n  _conv_Grad_ConvGrad_0 [shape=box style=\"filled,rounded\" color=orange label=\"ConvGrad\n(_conv_Grad_ConvGrad_0)\nauto_pad=b'NOTSET'\ngroup=1\npads=[1 1 1 1]\" fontsize=10];\n  out_con_0_grad -> _conv_Grad_ConvGrad_0;\n  X -> _conv_Grad_ConvGrad_0;\n  init -> _conv_Grad_ConvGrad_0;\n  _conv_Grad_ConvGrad_0 -> X_grad;\n  _conv_Grad_ConvGrad_0 -> init_grad;\n}");
    document.getElementById('M5cce4ce785834f039a973abe1c5ea317').innerHTML = svgGraph; });

    </script>


.. code:: ipython3

    oinf = OnnxInference(grad, runtime='onnxruntime1')

.. code:: ipython3

    res = oinf.run({'X': data[numpy.newaxis, numpy.newaxis, ...],
                    'init': kernel[numpy.newaxis, numpy.newaxis, ...]})
    res


.. parsed-literal::
    {'X_grad': array([[[[12., 21., 21., 21., 21., 21., 16.],
              [27., 45., 45., 45., 45., 45., 33.],
              [27., 45., 45., 45., 45., 45., 33.],
              [27., 45., 45., 45., 45., 45., 33.],
              [24., 39., 39., 39., 39., 39., 28.]]]], dtype=float32),
     'init_grad': array([[[[312., 378., 336.],
              [495., 595., 525.],
              [480., 574., 504.]]]], dtype=float32),
     'out_con_0': array([[[[ 134.,  211.,  250.,  289.,  328.,  367.,  238.],
              [ 333.,  492.,  537.,  582.,  627.,  672.,  423.],
              [ 564.,  807.,  852.,  897.,  942.,  987.,  612.],
              [ 795., 1122., 1167., 1212., 1257., 1302.,  801.],
              [ 422.,  571.,  592.,  613.,  634.,  655.,  382.]]]],
           dtype=float32)}


ConvTranspose
~~~~~~~~~~~~~

.. code:: ipython3

    from mlprodict.npy.xop import loadop
    
    OnnxConvTranspose = loadop('ConvTranspose')
    node = OnnxConvTranspose('X', kernel[numpy.newaxis, numpy.newaxis, ...], pads=[1, 1, 1, 1])
    onx = node.to_onnx(numpy.float32, numpy.float32)
    %onnxview onx


.. raw:: html

    <div id="M29d322ddfd174b3899c8b52a8e349de3-cont"><div id="M29d322ddfd174b3899c8b52a8e349de3" style="width:;height:;"></div></div>
    <script>

    require(['http://www.xavierdupre.fr/js/vizjs/viz.js'], function() { var svgGraph = Viz("digraph{\n  nodesep=0.05;\n  orientation=portrait;\n  size=7;\n  ranksep=0.25;\n\n  X [shape=box color=red label=\"X\nfloat(('?',))\" fontsize=10];\n\n  out_con_0 [shape=box color=green label=\"out_con_0\nfloat(('?',))\" fontsize=10];\n\n  init [shape=box label=\"init\nfloat32((1, 1, 3, 3))\n[[[[1. 2. 3.]\n   [4. 5. 6.]\n   [7. 8. 9.]]]]\" fontsize=10];\n\n  _convtranspose [shape=box style=\"filled,rounded\" color=orange label=\"ConvTranspose\n(_convtranspose)\npads=[1 1 1 1]\" fontsize=10];\n  X -> _convtranspose;\n  init -> _convtranspose;\n  _convtranspose -> out_con_0;\n}");
    document.getElementById('M29d322ddfd174b3899c8b52a8e349de3').innerHTML = svgGraph; });

    </script>


.. code:: ipython3

    oinf = OnnxInference(onx, runtime='onnxruntime1')
    ct = oinf.run({'X': impl[numpy.newaxis, numpy.newaxis, ...]})['out_con_0']
    ct


.. parsed-literal::
    array([[[[ 2672.,  5379.,  6804.,  7659.,  8514.,  8403.,  6254.],
             [ 8117., 15408., 18909., 20790., 22671., 21780., 15539.],
             [14868., 27315., 32400., 34425., 36450., 34191., 23922.],
             [20039., 35544., 41283., 43164., 45045., 41508., 28325.],
             [18608., 32055., 36756., 38151., 39546., 35943., 23966.]]]],
          dtype=float32)


im2col and col2im
-----------------

Function ``im2col`` transforms an image so that the convolution of this
image can be expressed as a matrix multiplication. It takes the image
and the kernel shape.

.. code:: ipython3

    from mlprodict.onnxrt.ops_cpu.op_conv_helper import im2col
    
    v = numpy.arange(5).astype(numpy.float32)
    w = im2col(v, (3, ))
    w


.. parsed-literal::
    array([[0., 0., 1.],
           [0., 1., 2.],
           [1., 2., 3.],
           [2., 3., 4.],
           [3., 4., 0.]], dtype=float32)


.. code:: ipython3

    k = numpy.array([1, 1, 1], dtype=numpy.float32)
    conv = w @ k
    conv


.. parsed-literal::
    array([1., 3., 6., 9., 7.], dtype=float32)


Let’s compare with the numpy function.

.. code:: ipython3

    numpy.convolve(v, k, mode='same')


.. parsed-literal::
    array([1., 3., 6., 9., 7.], dtype=float32)


:math:`conv(v, k) = im2col(v, shape(k)) \; k = w \; k` where
:math:`w = im2col(v, shape(k))`.

In deep neural network, the gradient is propagated from the last layer
to the first one. At some point, the backpropagation produces the
gradient :math:`\frac{d(E)}{d(conv)}`, the gradient of the error against
the outputs of the convolution layer. Then
:math:`\frac{d(E)}{d(v)} = \frac{d(E)}{d(conv(v, k))}\frac{d(conv(v, k))}{d(v)}`.

We need to compute
:math:`\frac{d(conv(v, k))}{d(v)} = \frac{d(conv(v, k))}{d(w)}\frac{d(w)}{d(v)}`.

We can say that :math:`\frac{d(conv(v, k))}{d(w)} = k`.

That leaves
:math:`\frac{d(w)}{d(v)} = \frac{d(im2col(v, shape(k)))}{d(v)}`. And
this last term is equal to :math:`im2col(m, shape(k))` where :math:`m`
is a matrix identical to :math:`v` except that all not null parameter
are replaced by 1. To summarize:
:math:`\frac{d(im2col(v, shape(k)))}{d(v)} = im2col(v \neq 0, shape(k))`.

Finally,
:math:`\frac{d(E)}{d(v)} = \frac{d(E)}{d(conv(v, k))}\frac{d(conv(v, k))}{d(v)} = \frac{d(E)}{d(conv(v, k))} \; k \; im2col(v \neq 0, shape(k))`.

Now, :math:`im2col(v \neq 0, shape(k))` is a very simple matrix with
only ones or zeros. Is there a way we can avoid doing the matrix
multiplication but simply adding terms? That’s the purpose of function
:math:`col2im` defined so that:

:math:`\frac{d(E)}{d(v)} = \frac{d(E)}{d(conv(v, k))} \; k \; im2col(v \neq 0, shape(k)) = col2im\left(\frac{d(E)}{d(conv(v, k))} \; k, shape(k) \right)`