.. _l-onnx-doccom.microsoft-BiasSoftmaxDropout: ================================== com.microsoft - BiasSoftmaxDropout ================================== .. contents:: :local: .. _l-onnx-opcom-microsoft-biassoftmaxdropout-1: BiasSoftmaxDropout - 1 (com.microsoft) ====================================== **Version** * **name**: `BiasSoftmaxDropout (GitHub) `_ * **domain**: **com.microsoft** * **since_version**: **1** * **function**: * **support_level**: * **shape inference**: This version of the operator has been available **since version 1 of domain com.microsoft**. **Summary** dropout_output, mask, softmax_output = Dropout(Softmax(data + bias), ratio), Intended to specialize the Add + Softmax + Dropout pattern commonly found in transformer models. **Attributes** * **axis**: apply softmax to elements for dimensions axis or higher Default value is ``?``. * **is_inner_broadcast** (required): true if broadcast bias across input for dimensions broadcast_axis to axis-1, otherwise broadcast bias across input for dimensions 0 to broadcast_axis-1 Default value is ``?``. * **seed**: (Optional) Seed to the random generator, if not specified we will auto generate one. Default value is ``?``. **Inputs** Between 2 and 3 inputs. * **data** (heterogeneous) - **T**: The input data as Tensor. * **bias** (heterogeneous) - **T**: The bias (or mask) as Tensor. * **ratio** (optional, heterogeneous) - **T1**: The ratio of random dropout, with value in [0, 1). If this input was not set, or if it was set to 0, the output would be a simple copy of the input. If it's non-zero, output will be a random dropout of the scaled input, which is typically the case during training. It is an optional value, if not specified it will default to 0.5. **Outputs** * **dropout_output** (heterogeneous) - **T**: The dropout output. * **mask** (heterogeneous) - **tensor(bool)**: The output mask of dropout. * **softmax_output** (heterogeneous) - **T**: The Softmax output for backward. **Examples**