com.microsoft - QuantizeWithOrder#

QuantizeWithOrder - 1 (com.microsoft)#

Version

name: QuantizeWithOrder (GitHub)
domain: com.microsoft
since_version: 1
function:
support_level:
shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Quantize input matrix to specific layout used in cublaslt.

Attributes

order_input (required): cublasLt order of input matrix. ORDER_COL = 0, ORDER_ROW = 1, ORDER_COL32 = 2, ORDER_COL4_4R2_8C = 3, ORDER_COL32_2R_4R4 = 4. Please refer https://docs.nvidia.com/cuda/cublas/index.html#cublasLtOrder_t for their meaning. Default value is ?.
order_output (required): cublasLt order of output matrix. Default value is ?.

Inputs

input (heterogeneous) - F: TODO: input tensor of (ROWS, COLS). if less than 2d, will broadcast to (1, X). If 3d, it is treated as (B, ROWS, COS)
scale_input (heterogeneous) - S: scale of the input

Outputs

output (heterogeneous) - Q: output tensor

Examples