com.microsoft - QuantizeBFP#

QuantizeBFP - 1 (com.microsoft)#

Version

  • name: QuantizeBFP (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

The BFP quantization operator. It consumes a full precision tensor and computes an BFP tensor. More documentation on the BFP format can be found in this paper: https://www.microsoft.com/en-us/research/publication/pushing-the-limits-of-narrow-precision-inferencing-at-cloud-scale-with-microsoft-floating-point/

Attributes

  • bfp_type (required): The type of BFP - must match with the BFPType enum Default value is ?.

  • block_dim: Each bounding box spans this dimension.Typically, the block dimension corresponds to the reduction dimension of the matrix multipication that consumes the output of this operator.For example, for a 2D matrix multiplication A@W, QuantizeBFP(A) would use block_dim 1 and QuantizeBFP(W) would use block_dim 0.The default is the last dimension. Default value is ?.

Inputs

  • x (heterogeneous) - T1: N-D full precision input tensor to be quantized.

Outputs

  • y (heterogeneous) - T2: 1-D, contiguous BFP data

  • shape (heterogeneous) - T3: Shape of x

  • strides (heterogeneous) - T3: Strides of x

Examples