.. _l-onnx-doccom.microsoft-QOrderedLongformerAttention:

===========================================
com.microsoft - QOrderedLongformerAttention
===========================================

.. contents::
    :local:


.. _l-onnx-opcom-microsoft-qorderedlongformerattention-1:

QOrderedLongformerAttention - 1 (com.microsoft)
===============================================

**Version**

* **name**: `QOrderedLongformerAttention (GitHub) <https://github.com/onnx/onnx/blob/main/docs/Operators.md#com.microsoft.QOrderedLongformerAttention>`_
* **domain**: **com.microsoft**
* **since_version**: **1**
* **function**:
* **support_level**:
* **shape inference**:

This version of the operator has been available
**since version 1 of domain com.microsoft**.

**Summary**

Quantized version of Longformer Self Attention (using int8 with specific matrix Layout).

**Attributes**

* **num_heads** (required):
  Number of attention heads Default value is ``?``.
* **order_global_weight** (required):
  cublasLt order of weight matrix Default value is ``?``.
* **order_input** (required):
  cublasLt order of input matrix. See the schema of QuantizeWithOrder
  for order definition. Default value is ``?``.
* **order_output** (required):
  cublasLt order of global bias Default value is ``?``.
* **order_weight** (required):
  cublasLt order of weight matrix Default value is ``?``.
* **window** (required):
  One sided attention windows length W, or half of total window length Default value is ``?``.

**Inputs**

* **input** (heterogeneous) - **Q**:
  3D input tensor with shape (batch_size, sequence_length,
  hidden_size), hidden_size = num_heads * head_size
* **scale_input** (heterogeneous) - **S**:
  scale of the input
* **weight** (heterogeneous) - **Q**:
  2D input tensor with shape (hidden_size, 3 * hidden_size)
* **scale_weight** (heterogeneous) - **S**:
  scale of the weight
* **bias** (heterogeneous) - **S**:
  1D input tensor with shape (3 * hidden_size), fp32 only currently.
* **scale_bias** (heterogeneous) - **S**:
  reserved. (not used as add bias need float value in cublasLt for
  normal order.)
* **scale_qkv_gemm** (heterogeneous) - **S**:
  scale of the output for fused kqv gemm
* **mask** (heterogeneous) - **F**:
  Attention mask with shape (batch_size, sequence_length)
* **global_weight** (heterogeneous) - **Q**:
  2D input tensor with shape (hidden_size, 3 * hidden_size)
* **scale_global_weight** (heterogeneous) - **S**:
  scale of the global_weight
* **global_bias** (heterogeneous) - **S**:
  1D input tensor with shape (3 * hidden_size)
* **scale_global_gemm** (heterogeneous) - **S**:
  scale of the global_qkv_gemm
* **global** (heterogeneous) - **G**:
  Global attention flags with shape (batch_size, sequence_length)
* **scale_output** (heterogeneous) - **S**:
  scale of the output

**Outputs**

* **output** (heterogeneous) - **Q**:
  3D output tensor with shape (batch_size, sequence_length,
  hidden_size)

**Examples**