.. _l-onnx-doccom.microsoft-QOrderedMatMul:

==============================
com.microsoft - QOrderedMatMul
==============================

.. contents::
    :local:


.. _l-onnx-opcom-microsoft-qorderedmatmul-1:

QOrderedMatMul - 1 (com.microsoft)
==================================

**Version**

* **name**: `QOrderedMatMul (GitHub) <https://github.com/onnx/onnx/blob/main/docs/Operators.md#com.microsoft.QOrderedMatMul>`_
* **domain**: **com.microsoft**
* **since_version**: **1**
* **function**:
* **support_level**:
* **shape inference**:

This version of the operator has been available
**since version 1 of domain com.microsoft**.

**Summary**

Quantize (Int8) MatMul with order. Implement Y = alpha * A * B + bias + beta * C. Matrix A, B, C, Y are all int8 matrix.
Two type of order combination supported:
  *) When order_B is ORDER_COL, order_A must be ORDER_ROW.
         bias is vector of {#cols of Y} of float32, C should be batch 1/batch_A. B could be of batch 1 or batch_A.
         Note B is reorder to ORDER_COL, or Transposed. Not Transposed first and then Reordered here.
  *) When order_B is specify ORDER_COL4_4R2_8C or ORDER_COL32_2R_4R4, orderA must be ORDER_COL32.
         MatMul will be implemented using alpha(A * B) + beta * C => Y.
         bias is not supported here. B in fact is transposed first then reordered into ORDER_COL4_4R2_8C or ORDER_COL32_2R_4R4 here.
order_Y and order_C will be same as order_A.
Support per column quantized weight, ie, scale_B is 1-D vector of size [#cols of matrix B].

**Attributes**

* **order_A** (required):
  cublasLt order of matrix A. See the schema of QuantizeWithOrder for
  order definition. Default value is ``?``.
* **order_B** (required):
  cublasLt order of matrix B Default value is ``?``.
* **order_Y** (required):
  cublasLt order of matrix Y and optional matrix C Default value is ``?``.

**Inputs**

Between 5 and 8 inputs.

* **A** (heterogeneous) - **Q**:
  3-dimensional matrix A
* **scale_A** (heterogeneous) - **S**:
  scale of the input A.
* **B** (heterogeneous) - **Q**:
  2-dimensional matrix B. Transposed if order_B is ORDER_COL.
* **scale_B** (heterogeneous) - **S**:
  scale of the input B. Scalar or 1-D float32.
* **scale_Y** (heterogeneous) - **S**:
  scale of the output Y.
* **bias** (optional, heterogeneous) - **S**:
  1d bias, not scaled with scale_Y.
* **C** (optional, heterogeneous) - **Q**:
  3d or 2d matrix C. if 2d expand to 3d first. Shape[0] should be 1 or
  same as A.shape[0]
* **scale_C** (optional, heterogeneous) - **S**:
  scale of the input A.

**Outputs**

* **Y** (heterogeneous) - **Q**:
  Matrix multiply results from A * B

**Examples**