com.microsoft - BiasSoftmaxDropout#

BiasSoftmaxDropout - 1 (com.microsoft)#

Version

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

dropout_output, mask, softmax_output = Dropout(Softmax(data + bias), ratio), Intended to specialize the Add + Softmax + Dropout pattern commonly found in transformer models.

Attributes

  • axis: apply softmax to elements for dimensions axis or higher Default value is ?.

  • is_inner_broadcast (required): true if broadcast bias across input for dimensions broadcast_axis to axis-1, otherwise broadcast bias across input for dimensions 0 to broadcast_axis-1 Default value is ?.

  • seed: (Optional) Seed to the random generator, if not specified we will auto generate one. Default value is ?.

Inputs

Between 2 and 3 inputs.

  • data (heterogeneous) - T: The input data as Tensor.

  • bias (heterogeneous) - T: The bias (or mask) as Tensor.

  • ratio (optional, heterogeneous) - T1: The ratio of random dropout, with value in [0, 1). If this input was not set, or if it was set to 0, the output would be a simple copy of the input. If it’s non-zero, output will be a random dropout of the scaled input, which is typically the case during training. It is an optional value, if not specified it will default to 0.5.

Outputs

  • dropout_output (heterogeneous) - T: The dropout output.

  • mask (heterogeneous) - tensor(bool): The output mask of dropout.

  • softmax_output (heterogeneous) - T: The Softmax output for backward.

Examples