com.microsoft - RemovePadding#

RemovePadding - 1 (com.microsoft)#

Version

  • name: RemovePadding (GitHub)

  • domain: com.microsoft

  • since_version: 1

  • function:

  • support_level:

  • shape inference:

This version of the operator has been available since version 1 of domain com.microsoft.

Summary

Compress transformer input by removing paddings. It assumes padding is on the right side of sequence.

The input has padding with shape (batch_size, sequence_length, hidden_size). This will generate two outputs: output has shape (total_tokens, hidden_size); token_offset with shape (batch_size, sequence_length).

token_offset has offsets of all non-padding tokens first, then offset of all padding tokens. It is a list of batch_size * sequence_length elements, which is reshaped to 2D for convenience of shape inference.

Inputs

  • input (heterogeneous) - T: Input tensor with shape (batch_size, sequence_length, hidden_size)

  • sequence_token_count (heterogeneous) - M: Number of non-padding tokens in each sequence with shape (batch_size).

Outputs

  • output (heterogeneous) - T: output tensor with shape (total_tokens, hidden_size)

  • token_offset (heterogeneous) - M: Offset of non-padding tokens, and those of padding tokens. Its shape is (batch_size, sequence_length)

  • cumulated_seq_len (heterogeneous) - M: Cumulated sequence lengths. Its shape is (batch_size + 1)

  • max_seq_len (heterogeneous) - M: Max sequence length without padding. Its shape is (1)

Examples