This version of the operator has been available since version 9.
reduction: Type of reduction to apply to loss: none, sum, mean(default). ‘none’: the output is the loss for each sample in the batch.’sum’: the output will be summed. ‘mean’: the sum of the output will be divided by the batch_size. Default value is
Between 3 and 4 inputs.
dY (heterogeneous) - T: gradient of Y
log_prob (heterogeneous) - T: logsoftmax(logits), (N+1)-D input of shape (batch_size).
label (heterogeneous) - Tind: label is N-D input whose shape should match that of logits. It is a tensor of nonnegative integers, where each element is the nonnegative integer label for the element of the batch.
weight (optional, heterogeneous) - T: weight for each sample. The shape is the same as label’s
d_logits (heterogeneous) - T: gradient of logits