.. _l-onnx-doccom.microsoft-QOrderedLongformerAttention: =========================================== com.microsoft - QOrderedLongformerAttention =========================================== .. contents:: :local: .. _l-onnx-opcom-microsoft-qorderedlongformerattention-1: QOrderedLongformerAttention - 1 (com.microsoft) =============================================== **Version** * **name**: `QOrderedLongformerAttention (GitHub) `_ * **domain**: **com.microsoft** * **since_version**: **1** * **function**: * **support_level**: * **shape inference**: This version of the operator has been available **since version 1 of domain com.microsoft**. **Summary** Quantized version of Longformer Self Attention (using int8 with specific matrix Layout). **Attributes** * **num_heads** (required): Number of attention heads Default value is ``?``. * **order_global_weight** (required): cublasLt order of weight matrix Default value is ``?``. * **order_input** (required): cublasLt order of input matrix. See the schema of QuantizeWithOrder for order definition. Default value is ``?``. * **order_output** (required): cublasLt order of global bias Default value is ``?``. * **order_weight** (required): cublasLt order of weight matrix Default value is ``?``. * **window** (required): One sided attention windows length W, or half of total window length Default value is ``?``. **Inputs** * **input** (heterogeneous) - **Q**: 3D input tensor with shape (batch_size, sequence_length, hidden_size), hidden_size = num_heads * head_size * **scale_input** (heterogeneous) - **S**: scale of the input * **weight** (heterogeneous) - **Q**: 2D input tensor with shape (hidden_size, 3 * hidden_size) * **scale_weight** (heterogeneous) - **S**: scale of the weight * **bias** (heterogeneous) - **S**: 1D input tensor with shape (3 * hidden_size), fp32 only currently. * **scale_bias** (heterogeneous) - **S**: reserved. (not used as add bias need float value in cublasLt for normal order.) * **scale_qkv_gemm** (heterogeneous) - **S**: scale of the output for fused kqv gemm * **mask** (heterogeneous) - **F**: Attention mask with shape (batch_size, sequence_length) * **global_weight** (heterogeneous) - **Q**: 2D input tensor with shape (hidden_size, 3 * hidden_size) * **scale_global_weight** (heterogeneous) - **S**: scale of the global_weight * **global_bias** (heterogeneous) - **S**: 1D input tensor with shape (3 * hidden_size) * **scale_global_gemm** (heterogeneous) - **S**: scale of the global_qkv_gemm * **global** (heterogeneous) - **G**: Global attention flags with shape (batch_size, sequence_length) * **scale_output** (heterogeneous) - **S**: scale of the output **Outputs** * **output** (heterogeneous) - **Q**: 3D output tensor with shape (batch_size, sequence_length, hidden_size) **Examples**