com.microsoft - QuantizeBFP#

QuantizeBFP - 1 (com.microsoft)#

Version

name: QuantizeBFP (GitHub)
domain: com.microsoft
since_version: 1
function:
support_level:
shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

The BFP quantization operator. It consumes a full precision tensor and computes an BFP tensor. More documentation on the BFP format can be found in this paper: https://www.microsoft.com/en-us/research/publication/pushing-the-limits-of-narrow-precision-inferencing-at-cloud-scale-with-microsoft-floating-point/

Attributes

bfp_type (required): The type of BFP - must match with the BFPType enum Default value is ?.
block_dim: Each bounding box spans this dimension.Typically, the block dimension corresponds to the reduction dimension of the matrix multipication that consumes the output of this operator.For example, for a 2D matrix multiplication A@W, QuantizeBFP(A) would use block_dim 1 and QuantizeBFP(W) would use block_dim 0.The default is the last dimension. Default value is ?.

Inputs

x (heterogeneous) - T1: N-D full precision input tensor to be quantized.

Outputs

y (heterogeneous) - T2: 1-D, contiguous BFP data
shape (heterogeneous) - T3: Shape of x
strides (heterogeneous) - T3: Strides of x

Examples