.. _l-onnx-doccom.microsoft-QOrderedMatMul: ============================== com.microsoft - QOrderedMatMul ============================== .. contents:: :local: .. _l-onnx-opcom-microsoft-qorderedmatmul-1: QOrderedMatMul - 1 (com.microsoft) ================================== **Version** * **name**: `QOrderedMatMul (GitHub) `_ * **domain**: **com.microsoft** * **since_version**: **1** * **function**: * **support_level**: * **shape inference**: This version of the operator has been available **since version 1 of domain com.microsoft**. **Summary** Quantize (Int8) MatMul with order. Implement Y = alpha * A * B + bias + beta * C. Matrix A, B, C, Y are all int8 matrix. Two type of order combination supported: *) When order_B is ORDER_COL, order_A must be ORDER_ROW. bias is vector of {#cols of Y} of float32, C should be batch 1/batch_A. B could be of batch 1 or batch_A. Note B is reorder to ORDER_COL, or Transposed. Not Transposed first and then Reordered here. *) When order_B is specify ORDER_COL4_4R2_8C or ORDER_COL32_2R_4R4, orderA must be ORDER_COL32. MatMul will be implemented using alpha(A * B) + beta * C => Y. bias is not supported here. B in fact is transposed first then reordered into ORDER_COL4_4R2_8C or ORDER_COL32_2R_4R4 here. order_Y and order_C will be same as order_A. Support per column quantized weight, ie, scale_B is 1-D vector of size [#cols of matrix B]. **Attributes** * **order_A** (required): cublasLt order of matrix A. See the schema of QuantizeWithOrder for order definition. Default value is ``?``. * **order_B** (required): cublasLt order of matrix B Default value is ``?``. * **order_Y** (required): cublasLt order of matrix Y and optional matrix C Default value is ``?``. **Inputs** Between 5 and 8 inputs. * **A** (heterogeneous) - **Q**: 3-dimensional matrix A * **scale_A** (heterogeneous) - **S**: scale of the input A. * **B** (heterogeneous) - **Q**: 2-dimensional matrix B. Transposed if order_B is ORDER_COL. * **scale_B** (heterogeneous) - **S**: scale of the input B. Scalar or 1-D float32. * **scale_Y** (heterogeneous) - **S**: scale of the output Y. * **bias** (optional, heterogeneous) - **S**: 1d bias, not scaled with scale_Y. * **C** (optional, heterogeneous) - **Q**: 3d or 2d matrix C. if 2d expand to 3d first. Shape[0] should be 1 or same as A.shape[0] * **scale_C** (optional, heterogeneous) - **S**: scale of the input A. **Outputs** * **Y** (heterogeneous) - **Q**: Matrix multiply results from A * B **Examples**