.. _l-onnx-doccom.microsoft-BiasSoftmaxDropout:

==================================
com.microsoft - BiasSoftmaxDropout
==================================

.. contents::
    :local:


.. _l-onnx-opcom-microsoft-biassoftmaxdropout-1:

BiasSoftmaxDropout - 1 (com.microsoft)
======================================

**Version**

* **name**: `BiasSoftmaxDropout (GitHub) <https://github.com/onnx/onnx/blob/main/docs/Operators.md#com.microsoft.BiasSoftmaxDropout>`_
* **domain**: **com.microsoft**
* **since_version**: **1**
* **function**:
* **support_level**:
* **shape inference**:

This version of the operator has been available
**since version 1 of domain com.microsoft**.

**Summary**

dropout_output, mask, softmax_output = Dropout(Softmax(data + bias), ratio), Intended to specialize the Add + Softmax + Dropout pattern commonly found in transformer models.

**Attributes**

* **axis**:
  apply softmax to elements for dimensions axis or higher Default value is ``?``.
* **is_inner_broadcast** (required):
  true if broadcast bias across input for dimensions broadcast_axis to
  axis-1, otherwise broadcast bias across input for dimensions 0 to
  broadcast_axis-1 Default value is ``?``.
* **seed**:
  (Optional) Seed to the random generator, if not specified we will
  auto generate one. Default value is ``?``.

**Inputs**

Between 2 and 3 inputs.

* **data** (heterogeneous) - **T**:
  The input data as Tensor.
* **bias** (heterogeneous) - **T**:
  The bias (or mask) as Tensor.
* **ratio** (optional, heterogeneous) - **T1**:
  The ratio of random dropout, with value in [0, 1). If this input was
  not set, or if it was set to 0, the output would be a simple copy of
  the input. If it's non-zero, output will be a random dropout of the
  scaled input, which is typically the case during training. It is an
  optional value, if not specified it will default to 0.5.

**Outputs**

* **dropout_output** (heterogeneous) - **T**:
  The dropout output.
* **mask** (heterogeneous) - **tensor(bool)**:
  The output mask of dropout.
* **softmax_output** (heterogeneous) - **T**:
  The Softmax output for backward.

**Examples**