com.microsoft - BiasSoftmaxDropout#
BiasSoftmaxDropout - 1 (com.microsoft)#
Version
domain: com.microsoft
since_version: 1
function:
support_level:
shape inference:
This version of the operator has been available since version 1 of domain com.microsoft.
Summary
dropout_output, mask, softmax_output = Dropout(Softmax(data + bias), ratio), Intended to specialize the Add + Softmax + Dropout pattern commonly found in transformer models.
Attributes
axis: apply softmax to elements for dimensions axis or higher Default value is
?
.is_inner_broadcast (required): true if broadcast bias across input for dimensions broadcast_axis to axis-1, otherwise broadcast bias across input for dimensions 0 to broadcast_axis-1 Default value is
?
.seed: (Optional) Seed to the random generator, if not specified we will auto generate one. Default value is
?
.
Inputs
Between 2 and 3 inputs.
data (heterogeneous) - T: The input data as Tensor.
bias (heterogeneous) - T: The bias (or mask) as Tensor.
ratio (optional, heterogeneous) - T1: The ratio of random dropout, with value in [0, 1). If this input was not set, or if it was set to 0, the output would be a simple copy of the input. If it’s non-zero, output will be a random dropout of the scaled input, which is typically the case during training. It is an optional value, if not specified it will default to 0.5.
Outputs
dropout_output (heterogeneous) - T: The dropout output.
mask (heterogeneous) - tensor(bool): The output mask of dropout.
softmax_output (heterogeneous) - T: The Softmax output for backward.
Examples