com.microsoft - BiasSoftmax#
name: BiasSoftmax (GitHub)
This version of the operator has been available since version 1 of domain com.microsoft.
Y = softmax(scores + bias)) with simple broadcast on bias. Intended to specialize softmax(scores + additive_mask) commonly found in transformer models.
axis: apply softmax to elements for dimensions axis or higher Default value is
is_inner_broadcast (required): true if broadcast bias across input for dimensions broadcast_axis to axis-1, otherwise broadcast bias across input for dimensions 0 to broadcast_axis - 1 Default value is
data (heterogeneous) - T: The input data as Tensor.
bias (heterogeneous) - T: The bias (or mask) as Tensor.
output (heterogeneous) - T: The output.