com.microsoft - BiasSoftmaxDropout#

BiasSoftmaxDropout - 1 (com.microsoft)#

Version

name: BiasSoftmaxDropout (GitHub)
domain: com.microsoft
since_version: 1
function:
support_level:
shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

dropout_output, mask, softmax_output = Dropout(Softmax(data + bias), ratio), Intended to specialize the Add + Softmax + Dropout pattern commonly found in transformer models.

Attributes

axis: apply softmax to elements for dimensions axis or higher Default value is ?.
is_inner_broadcast (required): true if broadcast bias across input for dimensions broadcast_axis to axis-1, otherwise broadcast bias across input for dimensions 0 to broadcast_axis-1 Default value is ?.
seed: (Optional) Seed to the random generator, if not specified we will auto generate one. Default value is ?.

Inputs

Between 2 and 3 inputs.

data (heterogeneous) - T: The input data as Tensor.
bias (heterogeneous) - T: The bias (or mask) as Tensor.
ratio (optional, heterogeneous) - T1: The ratio of random dropout, with value in [0, 1). If this input was not set, or if it was set to 0, the output would be a simple copy of the input. If it’s non-zero, output will be a random dropout of the scaled input, which is typically the case during training. It is an optional value, if not specified it will default to 0.5.

Outputs

dropout_output (heterogeneous) - T: The dropout output.
mask (heterogeneous) - tensor(bool): The output mask of dropout.
softmax_output (heterogeneous) - T: The Softmax output for backward.

Examples

Links

Contents

Information

Previous topic

Next topic

com.microsoft - BiasSoftmaxDropout#

BiasSoftmaxDropout - 1 (com.microsoft)#