# SoftmaxCrossEntropyLoss#

## SoftmaxCrossEntropyLoss - 13#

Version

• domain: main

• since_version: 13

• function: False

• support_level: SupportType.COMMON

• shape inference: True

This version of the operator has been available since version 13.

Summary

Loss function that measures the softmax cross entropy between ‘scores’ and ‘labels’. This operator first computes a loss tensor whose shape is identical to the labels input. If the input is 2-D with shape (N, C), the loss tensor may be a N-element vector L = (l_1, l_2, …, l_N). If the input is N-D tensor with shape (N, C, D1, D2, …, Dk), the loss tensor L may have (N, D1, D2, …, Dk) as its shape and L[i,][j_1][j_2]…[j_k] denotes a scalar element in L. After L is available, this operator can optionally do a reduction operator.

shape(scores): (N, C) where C is the number of classes, or (N, C, D1, D2,…, Dk),

with K >= 1 in case of K-dimensional loss.

shape(labels): (N) where each value is 0 <= labels[i] <= C-1, or (N, D1, D2,…, Dk),

with K >= 1 in case of K-dimensional loss.

The loss for one sample, l_i, can caculated as follows:

l[i][d1][d2]…[dk] = -y[i][c][d1][d2]..[dk], where i is the index of classes.

or

l[i][d1][d2]…[dk] = -y[i][c][d1][d2]..[dk] * weights[c], if ‘weights’ is provided.

loss is zero for the case when label-value equals ignore_index.

l[i][d1][d2]…[dk] = 0, when labels[n][d1][d2]…[dk] = ignore_index

where:

p = Softmax(scores) y = Log(p) c = labels[i][d1][d2]…[dk]

Finally, L is optionally reduced: If reduction = ‘none’, the output is L with shape (N, D1, D2, …, Dk). If reduction = ‘sum’, the output is scalar: Sum(L). If reduction = ‘mean’, the output is scalar: ReduceMean(L), or if weight is provided: ReduceSum(L) / ReduceSum(W), where tensor W is of shape (N, D1, D2, …, Dk) and W[n][d1][d2]…[dk] = weights[labels[i][d1][d2]…[dk]].

Attributes

• ignore_index: Specifies a target value that is ignored and does not contribute to the input gradient. It’s an optional value.

• reduction: Type of reduction to apply to loss: none, sum, mean(default). ‘none’: no reduction will be applied, ‘sum’: the output will be summed. ‘mean’: the sum of the output will be divided by the number of elements in the output. Default value is `'mean'`.

Inputs

Between 2 and 3 inputs.

• scores (heterogeneous) - T: The predicted outputs with shape [batch_size, class_size], or [batch_size, class_size, D1, D2 , …, Dk], where K is the number of dimensions.

• labels (heterogeneous) - Tind: The ground truth output tensor, with shape [batch_size], or [batch_size, D1, D2, …, Dk], where K is the number of dimensions. Labels element value shall be in range of [0, C). If ignore_index is specified, it may have a value outside [0, C) and the label values should either be in the range [0, C) or have the value ignore_index.

• weights (optional, heterogeneous) - T: A manual rescaling weight given to each class. If given, it has to be a 1D Tensor assigning weight to each of the classes. Otherwise, it is treated as if having all ones.

Outputs

Between 1 and 2 outputs.

• output (heterogeneous) - T: Weighted loss float Tensor. If reduction is ‘none’, this has the shape of [batch_size], or [batch_size, D1, D2, …, Dk] in case of K-dimensional loss. Otherwise, it is a scalar.

• log_prob (optional, heterogeneous) - T: Log probability tensor. If the output of softmax is prob, its value is log(prob).

Type Constraints

• T in ( tensor(bfloat16), tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

• Tind in ( tensor(int32), tensor(int64) ): Constrain target to integer types

Examples

Differences

 `0` `0` `Loss function that measures the softmax cross entropy` `Loss function that measures the softmax cross entropy` `1` `1` `between 'scores' and 'labels'.` `between 'scores' and 'labels'.` `2` `2` `This operator first computes a loss tensor whose shape is identical to the labels input.` `This operator first computes a loss tensor whose shape is identical to the labels input.` `3` `3` `If the input is 2-D with shape (N, C), the loss tensor may be a N-element vector L = (l_1, l_2, ..., l_N).` `If the input is 2-D with shape (N, C), the loss tensor may be a N-element vector L = (l_1, l_2, ..., l_N).` `4` `4` `If the input is N-D tensor with shape (N, C, D1, D2, ..., Dk),` `If the input is N-D tensor with shape (N, C, D1, D2, ..., Dk),` `5` `5` `the loss tensor L may have (N, D1, D2, ..., Dk) as its shape and L[i,][j_1][j_2]...[j_k] denotes a scalar element in L.` `the loss tensor L may have (N, D1, D2, ..., Dk) as its shape and L[i,][j_1][j_2]...[j_k] denotes a scalar element in L.` `6` `6` `After L is available, this operator can optionally do a reduction operator.` `After L is available, this operator can optionally do a reduction operator.` `7` `7` `8` `8` `shape(scores): (N, C) where C is the number of classes, or (N, C, D1, D2,..., Dk),` `shape(scores): (N, C) where C is the number of classes, or (N, C, D1, D2,..., Dk),` `9` `9` ` with K >= 1 in case of K-dimensional loss.` ` with K >= 1 in case of K-dimensional loss.` `10` `10` `shape(labels): (N) where each value is 0 <= labels[i] <= C-1, or (N, D1, D2,..., Dk),` `shape(labels): (N) where each value is 0 <= labels[i] <= C-1, or (N, D1, D2,..., Dk),` `11` `11` ` with K >= 1 in case of K-dimensional loss.` ` with K >= 1 in case of K-dimensional loss.` `12` `12` `13` `13` `The loss for one sample, l_i, can caculated as follows:` `The loss for one sample, l_i, can caculated as follows:` `14` `14` ` l[i][d1][d2]...[dk] = -y[i][c][d1][d2]..[dk], where i is the index of classes.` ` l[i][d1][d2]...[dk] = -y[i][c][d1][d2]..[dk], where i is the index of classes.` `15` `15` `or` `or` `16` `16` ` l[i][d1][d2]...[dk] = -y[i][c][d1][d2]..[dk] * weights[c], if 'weights' is provided.` ` l[i][d1][d2]...[dk] = -y[i][c][d1][d2]..[dk] * weights[c], if 'weights' is provided.` `17` `17` `18` `18` `loss is zero for the case when label-value equals ignore_index.` `loss is zero for the case when label-value equals ignore_index.` `19` `19` ` l[i][d1][d2]...[dk] = 0, when labels[n][d1][d2]...[dk] = ignore_index` ` l[i][d1][d2]...[dk] = 0, when labels[n][d1][d2]...[dk] = ignore_index` `20` `20` `21` `21` `where:` `where:` `22` `22` ` p = Softmax(scores)` ` p = Softmax(scores)` `23` `23` ` y = Log(p)` ` y = Log(p)` `24` `24` ` c = labels[i][d1][d2]...[dk]` ` c = labels[i][d1][d2]...[dk]` `25` `25` `26` `26` `Finally, L is optionally reduced:` `Finally, L is optionally reduced:` `27` `27` `If reduction = 'none', the output is L with shape (N, D1, D2, ..., Dk).` `If reduction = 'none', the output is L with shape (N, D1, D2, ..., Dk).` `28` `28` `If reduction = 'sum', the output is scalar: Sum(L).` `If reduction = 'sum', the output is scalar: Sum(L).` `29` `29` `If reduction = 'mean', the output is scalar: ReduceMean(L), or if weight is provided: ReduceSum(L) / ReduceSum(W),` `If reduction = 'mean', the output is scalar: ReduceMean(L), or if weight is provided: ReduceSum(L) / ReduceSum(W),` `30` `30` `where tensor W is of shape (N, D1, D2, ..., Dk) and W[n][d1][d2]...[dk] = weights[labels[i][d1][d2]...[dk]].` `where tensor W is of shape (N, D1, D2, ..., Dk) and W[n][d1][d2]...[dk] = weights[labels[i][d1][d2]...[dk]].` `31` `31` `32` `32` `**Attributes**` `**Attributes**` `33` `33` `34` `34` `* **ignore_index**:` `* **ignore_index**:` `35` `35` ` Specifies a target value that is ignored and does not contribute to` ` Specifies a target value that is ignored and does not contribute to` `36` `36` ` the input gradient. It's an optional value.` ` the input gradient. It's an optional value.` `37` `37` `* **reduction**:` `* **reduction**:` `38` `38` ` Type of reduction to apply to loss: none, sum, mean(default).` ` Type of reduction to apply to loss: none, sum, mean(default).` `39` `39` ` 'none': no reduction will be applied, 'sum': the output will be` ` 'none': no reduction will be applied, 'sum': the output will be` `40` `40` ` summed. 'mean': the sum of the output will be divided by the number` ` summed. 'mean': the sum of the output will be divided by the number` `41` `41` ` of elements in the output. Default value is 'mean'.` ` of elements in the output. Default value is 'mean'.` `42` `42` `43` `43` `**Inputs**` `**Inputs**` `44` `44` `45` `45` `Between 2 and 3 inputs.` `Between 2 and 3 inputs.` `46` `46` `47` `47` `* **scores** (heterogeneous) - **T**:` `* **scores** (heterogeneous) - **T**:` `48` `48` ` The predicted outputs with shape [batch_size, class_size], or` ` The predicted outputs with shape [batch_size, class_size], or` `49` `49` ` [batch_size, class_size, D1, D2 , ..., Dk], where K is the number of` ` [batch_size, class_size, D1, D2 , ..., Dk], where K is the number of` `50` `50` ` dimensions.` ` dimensions.` `51` `51` `* **labels** (heterogeneous) - **Tind**:` `* **labels** (heterogeneous) - **Tind**:` `52` `52` ` The ground truth output tensor, with shape [batch_size], or` ` The ground truth output tensor, with shape [batch_size], or` `53` `53` ` [batch_size, D1, D2, ..., Dk], where K is the number of dimensions.` ` [batch_size, D1, D2, ..., Dk], where K is the number of dimensions.` `54` `54` ` Labels element value shall be in range of [0, C). If ignore_index is` ` Labels element value shall be in range of [0, C). If ignore_index is` `55` `55` ` specified, it may have a value outside [0, C) and the label values` ` specified, it may have a value outside [0, C) and the label values` `56` `56` ` should either be in the range [0, C) or have the value ignore_index.` ` should either be in the range [0, C) or have the value ignore_index.` `57` `57` `* **weights** (optional, heterogeneous) - **T**:` `* **weights** (optional, heterogeneous) - **T**:` `58` `58` ` A manual rescaling weight given to each class. If given, it has to` ` A manual rescaling weight given to each class. If given, it has to` `59` `59` ` be a 1D Tensor assigning weight to each of the classes. Otherwise,` ` be a 1D Tensor assigning weight to each of the classes. Otherwise,` `60` `60` ` it is treated as if having all ones.` ` it is treated as if having all ones.` `61` `61` `62` `62` `**Outputs**` `**Outputs**` `63` `63` `64` `64` `Between 1 and 2 outputs.` `Between 1 and 2 outputs.` `65` `65` `66` `66` `* **output** (heterogeneous) - **T**:` `* **output** (heterogeneous) - **T**:` `67` `67` ` Weighted loss float Tensor. If reduction is 'none', this has the` ` Weighted loss float Tensor. If reduction is 'none', this has the` `68` `68` ` shape of [batch_size], or [batch_size, D1, D2, ..., Dk] in case of` ` shape of [batch_size], or [batch_size, D1, D2, ..., Dk] in case of` `69` `69` ` K-dimensional loss. Otherwise, it is a scalar.` ` K-dimensional loss. Otherwise, it is a scalar.` `70` `70` `* **log_prob** (optional, heterogeneous) - **T**:` `* **log_prob** (optional, heterogeneous) - **T**:` `71` `71` ` Log probability tensor. If the output of softmax is prob, its value` ` Log probability tensor. If the output of softmax is prob, its value` `72` `72` ` is log(prob).` ` is log(prob).` `73` `73` `74` `74` `**Type Constraints**` `**Type Constraints**` `75` `75` `76` `76` `* **T** in (` `* **T** in (` `77` ` tensor(bfloat16),` `77` `78` ` tensor(double),` ` tensor(double),` `78` `79` ` tensor(float),` ` tensor(float),` `79` `80` ` tensor(float16)` ` tensor(float16)` `80` `81` ` ):` ` ):` `81` `82` ` Constrain input and output types to float tensors.` ` Constrain input and output types to float tensors.` `82` `83` `* **Tind** in (` `* **Tind** in (` `83` `84` ` tensor(int32),` ` tensor(int32),` `84` `85` ` tensor(int64)` ` tensor(int64)` `85` `86` ` ):` ` ):` `86` `87` ` Constrain target to integer types` ` Constrain target to integer types`

## SoftmaxCrossEntropyLoss - 12#

Version

• domain: main

• since_version: 12

• function: False

• support_level: SupportType.COMMON

• shape inference: True

This version of the operator has been available since version 12.

Summary

Loss function that measures the softmax cross entropy between ‘scores’ and ‘labels’. This operator first computes a loss tensor whose shape is identical to the labels input. If the input is 2-D with shape (N, C), the loss tensor may be a N-element vector L = (l_1, l_2, …, l_N). If the input is N-D tensor with shape (N, C, D1, D2, …, Dk), the loss tensor L may have (N, D1, D2, …, Dk) as its shape and L[i,][j_1][j_2]…[j_k] denotes a scalar element in L. After L is available, this operator can optionally do a reduction operator.

shape(scores): (N, C) where C is the number of classes, or (N, C, D1, D2,…, Dk),

with K >= 1 in case of K-dimensional loss.

shape(labels): (N) where each value is 0 <= labels[i] <= C-1, or (N, D1, D2,…, Dk),

with K >= 1 in case of K-dimensional loss.

The loss for one sample, l_i, can caculated as follows:

l[i][d1][d2]…[dk] = -y[i][c][d1][d2]..[dk], where i is the index of classes.

or

l[i][d1][d2]…[dk] = -y[i][c][d1][d2]..[dk] * weights[c], if ‘weights’ is provided.

loss is zero for the case when label-value equals ignore_index.

l[i][d1][d2]…[dk] = 0, when labels[n][d1][d2]…[dk] = ignore_index

where:

p = Softmax(scores) y = Log(p) c = labels[i][d1][d2]…[dk]

Finally, L is optionally reduced: If reduction = ‘none’, the output is L with shape (N, D1, D2, …, Dk). If reduction = ‘sum’, the output is scalar: Sum(L). If reduction = ‘mean’, the output is scalar: ReduceMean(L), or if weight is provided: ReduceSum(L) / ReduceSum(W), where tensor W is of shape (N, D1, D2, …, Dk) and W[n][d1][d2]…[dk] = weights[labels[i][d1][d2]…[dk]].

Attributes

• ignore_index: Specifies a target value that is ignored and does not contribute to the input gradient. It’s an optional value.

• reduction: Type of reduction to apply to loss: none, sum, mean(default). ‘none’: no reduction will be applied, ‘sum’: the output will be summed. ‘mean’: the sum of the output will be divided by the number of elements in the output. Default value is `'mean'`.

Inputs

Between 2 and 3 inputs.

• scores (heterogeneous) - T: The predicted outputs with shape [batch_size, class_size], or [batch_size, class_size, D1, D2 , …, Dk], where K is the number of dimensions.

• labels (heterogeneous) - Tind: The ground truth output tensor, with shape [batch_size], or [batch_size, D1, D2, …, Dk], where K is the number of dimensions. Labels element value shall be in range of [0, C). If ignore_index is specified, it may have a value outside [0, C) and the label values should either be in the range [0, C) or have the value ignore_index.

• weights (optional, heterogeneous) - T: A manual rescaling weight given to each class. If given, it has to be a 1D Tensor assigning weight to each of the classes. Otherwise, it is treated as if having all ones.

Outputs

Between 1 and 2 outputs.

• output (heterogeneous) - T: Weighted loss float Tensor. If reduction is ‘none’, this has the shape of [batch_size], or [batch_size, D1, D2, …, Dk] in case of K-dimensional loss. Otherwise, it is a scalar.

• log_prob (optional, heterogeneous) - T: Log probability tensor. If the output of softmax is prob, its value is log(prob).

Type Constraints

• T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.

• Tind in ( tensor(int32), tensor(int64) ): Constrain target to integer types