.. _l-onnx-doccom.microsoft-QGemm:
=====================
com.microsoft - QGemm
=====================
.. contents::
:local:
.. _l-onnx-opcom-microsoft-qgemm-1:
QGemm - 1 (com.microsoft)
=========================
**Version**
* **name**: `QGemm (GitHub) `_
* **domain**: **com.microsoft**
* **since_version**: **1**
* **function**:
* **support_level**:
* **shape inference**:
This version of the operator has been available
**since version 1 of domain com.microsoft**.
**Summary**
Quantized Gemm
**Attributes**
* **alpha**:
Scalar multiplier for the product of input tensors A * B. Default value is ``?``.
* **transA**:
Whether A should be transposed Default value is ``?``.
* **transB**:
Whether B should be transposed Default value is ``?``.
**Inputs**
Between 6 and 9 inputs.
* **A** (heterogeneous) - **TA**:
Input tensor A. The shape of A should be (M, K) if transA is 0, or
(K, M) if transA is non-zero.
* **a_scale** (heterogeneous) - **T**:
Scale of quantized input 'A'. It is a scalar,which means a per-
tensor quantization.
* **a_zero_point** (heterogeneous) - **TA**:
Zero point tensor for input 'A'. It is a scalar.
* **B** (heterogeneous) - **TB**:
Input tensor B. The shape of B should be (K, N) if transB is 0, or
(N, K) if transB is non-zero.
* **b_scale** (heterogeneous) - **T**:
Scale of quantized input 'B'. It could be a scalar or a 1-D tensor,
which means a per-tensor or per-column quantization. If it's a 1-D
tensor, its number of elements should be equal to the number of
columns of input 'B'.
* **b_zero_point** (heterogeneous) - **TB**:
Zero point tensor for input 'B'. It's optional and default value is
0. It could be a scalar or a 1-D tensor, which means a per-tensor
or per-column quantization. If it's a 1-D tensor, its number of
elements should be equal to the number of columns of input 'B'.
* **C** (optional, heterogeneous) - **TC**:
Optional input tensor C. If not specified, the computation is done
as if C is a scalar 0. The shape of C should be unidirectional
broadcastable to (M, N). Its type is int32_t and must be quantized
with zero_point = 0 and scale = alpha / beta * a_scale * b_scale.
* **y_scale** (optional, heterogeneous) - **T**:
Scale of output 'Y'. It is a scalar, which means a per-tensor
quantization. It is optional. The output is full precision(float32)
if it is not provided. Or the output is quantized.
* **y_zero_point** (optional, heterogeneous) - **TYZ**:
Zero point tensor for output 'Y'. It is a scalar, which means a per-
tensor quantization. It is optional. The output is full
precision(float32) if it is not provided. Or the output is
quantized.
**Outputs**
* **Y** (heterogeneous) - **TY**:
Output tensor of shape (M, N).
**Examples**