# QLinearMatMul#

## QLinearMatMul - 10#

Version

• domain: main

• since_version: 10

• function: False

• support_level: SupportType.COMMON

• shape inference: True

This version of the operator has been available since version 10.

Summary

Matrix product that behaves like numpy.matmul: https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.matmul.html. It consumes two quantized input tensors, their scales and zero points, scale and zero point of output, and computes the quantized output. The quantization formula is y = saturate((x / y_scale) + y_zero_point). For (x / y_scale), it is rounding to nearest ties to even. Refer to https://en.wikipedia.org/wiki/Rounding for details. Scale and zero point must have same shape. They must be either scalar (per tensor) or N-D tensor (per row for ‘a’ and per column for ‘b’). Scalar refers to per tensor quantization whereas N-D refers to per row or per column quantization. If the input is 2D of shape [M, K] then zero point and scale tensor may be an M element vector [v_1, v_2, …, v_M] for per row quantization and K element vector of shape [v_1, v_2, …, v_K] for per column quantization. If the input is N-D tensor with shape [D1, D2, M, K] then zero point and scale tensor may have shape [D1, D2, M, 1] for per row quantization and shape [D1, D2, 1, K] for per column quantization. Production must never overflow, and accumulation may overflow if and only if in 32 bits.

Inputs

• a (heterogeneous) - T1: N-dimensional quantized matrix a

• a_scale (heterogeneous) - tensor(float): scale of quantized input a

• a_zero_point (heterogeneous) - T1: zero point of quantized input a

• b (heterogeneous) - T2: N-dimensional quantized matrix b

• b_scale (heterogeneous) - tensor(float): scale of quantized input b

• b_zero_point (heterogeneous) - T2: zero point of quantized input b

• y_scale (heterogeneous) - tensor(float): scale of quantized output y

• y_zero_point (heterogeneous) - T3: zero point of quantized output y

Outputs

• y (heterogeneous) - T3: Quantized matrix multiply results from a * b

Type Constraints

• T1 in ( tensor(int8), tensor(uint8) ): Constrain input a and its zero point data type to 8-bit integer tensor.

• T2 in ( tensor(int8), tensor(uint8) ): Constrain input b and its zero point data type to 8-bit integer tensor.

• T3 in ( tensor(int8), tensor(uint8) ): Constrain output y and its zero point data type to 8-bit integer tensor.

Examples

default

```node = onnx.helper.make_node(
"QLinearMatMul",
inputs=[
"a",
"a_scale",
"a_zero_point",
"b",
"b_scale",
"b_zero_point",
"y_scale",
"y_zero_point",
],
outputs=["y"],
)

# 2D
a = np.array(
[
[208, 236, 0, 238],
[3, 214, 255, 29],
],
dtype=np.uint8,
)

a_scale = np.array([0.0066], dtype=np.float32)
a_zero_point = np.array(, dtype=np.uint8)

b = np.array(
[[152, 51, 244], [60, 26, 255], [0, 127, 246], [127, 254, 247]],
dtype=np.uint8,
)

b_scale = np.array([0.00705], dtype=np.float32)
b_zero_point = np.array(, dtype=np.uint8)

y_scale = np.array([0.0107], dtype=np.float32)
y_zero_point = np.array(, dtype=np.uint8)

output = np.array(
[
[168, 115, 255],
[1, 66, 151],
],
dtype=np.uint8,
)

expect(
node,
inputs=[
a,
a_scale,
a_zero_point,
b,
b_scale,
b_zero_point,
y_scale,
y_zero_point,
],
outputs=[output],
name="test_qlinearmatmul_2D",
)

# 3D
a = np.array(
[
[[208, 236, 0, 238], [3, 214, 255, 29]],
[[208, 236, 0, 238], [3, 214, 255, 29]],
],
dtype=np.uint8,
)

a_scale = np.array([0.0066], dtype=np.float32)
a_zero_point = np.array(, dtype=np.uint8)

b = np.array(
[
[[152, 51, 244], [60, 26, 255], [0, 127, 246], [127, 254, 247]],
[[152, 51, 244], [60, 26, 255], [0, 127, 246], [127, 254, 247]],
],
dtype=np.uint8,
)

b_scale = np.array([0.00705], dtype=np.float32)
b_zero_point = np.array(, dtype=np.uint8)

y_scale = np.array([0.0107], dtype=np.float32)
y_zero_point = np.array(, dtype=np.uint8)

output = np.array(
[[[168, 115, 255], [1, 66, 151]], [[168, 115, 255], [1, 66, 151]]],
dtype=np.uint8,
)

expect(
node,
inputs=[
a,
a_scale,
a_zero_point,
b,
b_scale,
b_zero_point,
y_scale,
y_zero_point,
],
outputs=[output],
name="test_qlinearmatmul_3D",
)
```