MaxPool#
MaxPool  12#
Version
name: MaxPool (GitHub)
domain: main
since_version: 12
function: False
support_level: SupportType.COMMON
shape inference: True
This version of the operator has been available since version 12.
Summary
MaxPool consumes an input tensor X and applies max pooling across the tensor according to kernel sizes, stride sizes, and pad lengths. max pooling consisting of computing the max on all values of a subset of the input tensor according to the kernel size and downsampling the data into the output tensor Y for further processing. The output spatial shape will be following:
output_spatial_shape[i] = floor((input_spatial_shape[i] + pad_shape[i]  ((kernel_spatial_shape[i]  1) * dilations[i] + 1)) / strides_spatial_shape[i] + 1)
or#
output_spatial_shape[i] = ceil((input_spatial_shape[i] + pad_shape[i]  ((kernel_spatial_shape[i]  1) * dilations[i] + 1)) / strides_spatial_shape[i] + 1)
if ceil_mode is enabled
* pad_shape[i] is sum of pads along axis i
auto_pad is a DEPRECATED attribute. If you are using them currently, the output spatial shape will be following:
VALID: output_spatial_shape[i] = ceil((input_spatial_shape[i]  ((kernel_spatial_shape[i]  1) * dilations[i] + 1) + 1) / strides_spatial_shape[i])
SAME_UPPER or SAME_LOWER: output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides_spatial_shape[i])
And pad shape will be following if SAME_UPPER or SAME_LOWER:
pad_shape[i] = (output_spatial_shape[i]  1) * strides_spatial_shape[i] + ((kernel_spatial_shape[i]  1) * dilations[i] + 1)  input_spatial_shape[i]
The output of each pooling window is maximum number of elements exclude pad.
Attributes
auto_pad: auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that output_shape[i] = ceil(input_shape[i] / strides[i]) for each axis i. The padding is split between the two sides equally or almost equally (depending on whether it is even or odd). In case the padding is an odd number, the extra padding is added at the end for SAME_UPPER and at the beginning for SAME_LOWER. Default value is
'NOTSET'
.ceil_mode: Whether to use ceil or floor (default) to compute the output shape. Default value is
0
.dilations: Dilation value along each spatial axis of filter. If not present, the dilation defaults to 1 along each spatial axis.
kernel_shape (required): The size of the kernel along each axis.
pads: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. pads format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.
storage_order: The storage order of the tensor. 0 is row major, and 1 is column major. This attribute is used only to convert an ntuple index value into a single integer value for producing the second output. Default value is
0
.strides: Stride along each spatial axis. If not present, the stride defaults to 1 along each spatial axis.
Inputs
X (heterogeneous)  T: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size. Optionally, if dimension denotation is in effect, the operation expects the input data tensor to arrive with the dimension denotation of [DATA_BATCH, DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE …].
Outputs
Between 1 and 2 outputs.
Y (heterogeneous)  T: Output data tensor from average or max pooling across the input tensor. Dimensions will vary based on various kernel, stride, and pad sizes. Floor value of the dimension is used
Indices (optional, heterogeneous)  I: Indices tensor from max pooling across the input tensor. The dimensions of indices are the same as output tensor. The values in indices of are the indices of the selected values during pooling. The indices are computed as flatten 1D tensor, and the indices do not consider padding. So the values in indices are in [0, N x C x D1 x … x Dn).
Type Constraints
T in ( tensor(double), tensor(float), tensor(float16), tensor(int8), tensor(uint8) ): Constrain input and output types to float and 8 bit tensors.
I in ( tensor(int64) ): Constrain index tensor to int64
Examples
_maxpool_2d_uint8
"""
input_shape: [1, 1, 5, 5]
output_shape: [1, 1, 5, 5]
pad_shape: [4, 4] > [2, 2, 2, 2] by axis
"""
node = onnx.helper.make_node(
"MaxPool",
inputs=["x"],
outputs=["y"],
kernel_shape=[5, 5],
pads=[2, 2, 2, 2],
)
x = np.array(
[
[
[
[1, 2, 3, 4, 5],
[6, 7, 8, 9, 10],
[11, 12, 13, 14, 15],
[16, 17, 18, 19, 20],
[21, 22, 23, 24, 25],
]
]
]
).astype(np.uint8)
y = np.array(
[
[
[
[13, 14, 15, 15, 15],
[18, 19, 20, 20, 20],
[23, 24, 25, 25, 25],
[23, 24, 25, 25, 25],
[23, 24, 25, 25, 25],
]
]
]
).astype(np.uint8)
expect(node, inputs=[x], outputs=[y], name="test_maxpool_2d_uint8")
_maxpool_2d_precomputed_pads
"""
input_shape: [1, 1, 5, 5]
output_shape: [1, 1, 5, 5]
pad_shape: [4, 4] > [2, 2, 2, 2] by axis
"""
node = onnx.helper.make_node(
"MaxPool",
inputs=["x"],
outputs=["y"],
kernel_shape=[5, 5],
pads=[2, 2, 2, 2],
)
x = np.array(
[
[
[
[1, 2, 3, 4, 5],
[6, 7, 8, 9, 10],
[11, 12, 13, 14, 15],
[16, 17, 18, 19, 20],
[21, 22, 23, 24, 25],
]
]
]
).astype(np.float32)
y = np.array(
[
[
[
[13, 14, 15, 15, 15],
[18, 19, 20, 20, 20],
[23, 24, 25, 25, 25],
[23, 24, 25, 25, 25],
[23, 24, 25, 25, 25],
]
]
]
).astype(np.float32)
expect(node, inputs=[x], outputs=[y], name="test_maxpool_2d_precomputed_pads")
_maxpool_with_argmax_2d_precomputed_pads
"""
input_shape: [1, 1, 5, 5]
output_shape: [1, 1, 5, 5]
pad_shape: [4, 4] > [2, 2, 2, 2] by axis
"""
node = onnx.helper.make_node(
"MaxPool",
inputs=["x"],
outputs=["y", "z"],
kernel_shape=[5, 5],
pads=[2, 2, 2, 2],
)
x = np.array(
[
[
[
[1, 2, 3, 4, 5],
[6, 7, 8, 9, 10],
[11, 12, 13, 14, 15],
[16, 17, 18, 19, 20],
[21, 22, 23, 24, 25],
]
]
]
).astype(np.float32)
y = np.array(
[
[
[
[13, 14, 15, 15, 15],
[18, 19, 20, 20, 20],
[23, 24, 25, 25, 25],
[23, 24, 25, 25, 25],
[23, 24, 25, 25, 25],
]
]
]
).astype(np.float32)
z = np.array(
[
[
[
[12, 13, 14, 14, 14],
[17, 18, 19, 19, 19],
[22, 23, 24, 24, 24],
[22, 23, 24, 24, 24],
[22, 23, 24, 24, 24],
]
]
]
).astype(np.int64)
expect(
node,
inputs=[x],
outputs=[y, z],
name="test_maxpool_with_argmax_2d_precomputed_pads",
)
_maxpool_2d_precomputed_strides
"""
input_shape: [1, 1, 5, 5]
output_shape: [1, 1, 2, 2]
"""
node = onnx.helper.make_node(
"MaxPool", inputs=["x"], outputs=["y"], kernel_shape=[2, 2], strides=[2, 2]
)
x = np.array(
[
[
[
[1, 2, 3, 4, 5],
[6, 7, 8, 9, 10],
[11, 12, 13, 14, 15],
[16, 17, 18, 19, 20],
[21, 22, 23, 24, 25],
]
]
]
).astype(np.float32)
y = np.array([[[[7, 9], [17, 19]]]]).astype(np.float32)
expect(
node, inputs=[x], outputs=[y], name="test_maxpool_2d_precomputed_strides"
)
_maxpool_with_argmax_2d_precomputed_strides
"""
input_shape: [1, 1, 5, 5]
output_shape: [1, 1, 2, 2]
"""
node = onnx.helper.make_node(
"MaxPool",
inputs=["x"],
outputs=["y", "z"],
kernel_shape=[2, 2],
strides=[2, 2],
storage_order=1,
)
x = np.array(
[
[
[
[1, 2, 3, 4, 5],
[6, 7, 8, 9, 10],
[11, 12, 13, 14, 15],
[16, 17, 18, 19, 20],
[21, 22, 23, 24, 25],
]
]
]
).astype(np.float32)
y = np.array([[[[7, 9], [17, 19]]]]).astype(np.float32)
z = np.array([[[[6, 16], [8, 18]]]]).astype(np.int64)
expect(
node,
inputs=[x],
outputs=[y, z],
name="test_maxpool_with_argmax_2d_precomputed_strides",
)
_maxpool_2d_precomputed_same_upper
"""
input_shape: [1, 1, 5, 5]
output_shape: [1, 1, 3, 3]
pad_shape: [2, 2] > [1, 1, 1, 1] by axis
"""
node = onnx.helper.make_node(
"MaxPool",
inputs=["x"],
outputs=["y"],
kernel_shape=[3, 3],
strides=[2, 2],
auto_pad="SAME_UPPER",
)
x = np.array(
[
[
[
[1, 2, 3, 4, 5],
[6, 7, 8, 9, 10],
[11, 12, 13, 14, 15],
[16, 17, 18, 19, 20],
[21, 22, 23, 24, 25],
]
]
]
).astype(np.float32)
y = np.array([[[[7, 9, 10], [17, 19, 20], [22, 24, 25]]]]).astype(np.float32)
expect(
node, inputs=[x], outputs=[y], name="test_maxpool_2d_precomputed_same_upper"
)
_maxpool_1d_default
"""
input_shape: [1, 3, 32]
output_shape: [1, 3, 31]
"""
node = onnx.helper.make_node(
"MaxPool",
inputs=["x"],
outputs=["y"],
kernel_shape=[2],
)
x = np.random.randn(1, 3, 32).astype(np.float32)
x_shape = np.shape(x)
kernel_shape = [2]
strides = [1]
out_shape = get_output_shape("VALID", x_shape[2:], kernel_shape, strides)
padded = x
y = pool(padded, x_shape, kernel_shape, strides, out_shape, [0], "MAX")
expect(node, inputs=[x], outputs=[y], name="test_maxpool_1d_default")
_maxpool_2d_default
"""
input_shape: [1, 3, 32, 32]
output_shape: [1, 3, 31, 31]
"""
node = onnx.helper.make_node(
"MaxPool",
inputs=["x"],
outputs=["y"],
kernel_shape=[2, 2],
)
x = np.random.randn(1, 3, 32, 32).astype(np.float32)
x_shape = np.shape(x)
kernel_shape = (2, 2)
strides = (1, 1)
out_shape = get_output_shape("VALID", x_shape[2:], kernel_shape, strides)
padded = x
y = pool(padded, x_shape, kernel_shape, strides, out_shape, (0, 0), "MAX")
expect(node, inputs=[x], outputs=[y], name="test_maxpool_2d_default")
_maxpool_3d_default
"""
input_shape: [1, 3, 32, 32, 32]
output_shape: [1, 3, 31, 31, 31]
"""
node = onnx.helper.make_node(
"MaxPool",
inputs=["x"],
outputs=["y"],
kernel_shape=[2, 2, 2],
)
x = np.random.randn(1, 3, 32, 32, 32).astype(np.float32)
x_shape = np.shape(x)
kernel_shape = [2, 2, 2]
strides = [1, 1, 1]
out_shape = get_output_shape("VALID", x_shape[2:], kernel_shape, strides)
padded = x
y = pool(padded, x_shape, kernel_shape, strides, out_shape, [0, 0, 0], "MAX")
expect(node, inputs=[x], outputs=[y], name="test_maxpool_3d_default")
_maxpool_2d_same_upper
"""
input_shape: [1, 3, 32, 32]
output_shape: [1, 3, 32, 32]
pad_shape: [1, 1] > [0, 1, 0, 1] by axis
"""
node = onnx.helper.make_node(
"MaxPool",
inputs=["x"],
outputs=["y"],
kernel_shape=[2, 2],
auto_pad="SAME_UPPER",
)
x = np.random.randn(1, 3, 32, 32).astype(np.float32)
x_shape = np.shape(x)
kernel_shape = (2, 2)
strides = (1, 1)
out_shape = get_output_shape("SAME_UPPER", x_shape[2:], kernel_shape, strides)
pad_shape = get_pad_shape(
"SAME_UPPER", x_shape[2:], kernel_shape, strides, out_shape
)
pad_top = pad_shape[0] // 2
pad_bottom = pad_shape[0]  pad_top
pad_left = pad_shape[1] // 2
pad_right = pad_shape[1]  pad_left
padded = np.pad(
x,
((0, 0), (0, 0), (pad_top, pad_bottom), (pad_left, pad_right)),
mode="constant",
constant_values=np.nan,
)
y = pool(padded, x_shape, kernel_shape, strides, out_shape, pad_shape, "MAX")
expect(node, inputs=[x], outputs=[y], name="test_maxpool_2d_same_upper")
_maxpool_2d_same_lower
"""
input_shape: [1, 3, 32, 32]
output_shape: [1, 3, 32, 32]
pad_shape: [1, 1] > [1, 0, 1, 0] by axis
"""
node = onnx.helper.make_node(
"MaxPool",
inputs=["x"],
outputs=["y"],
kernel_shape=[2, 2],
auto_pad="SAME_LOWER",
)
x = np.random.randn(1, 3, 32, 32).astype(np.float32)
x_shape = np.shape(x)
kernel_shape = (2, 2)
strides = (1, 1)
out_shape = get_output_shape("SAME_LOWER", x_shape[2:], kernel_shape, strides)
pad_shape = get_pad_shape(
"SAME_LOWER", x_shape[2:], kernel_shape, strides, out_shape
)
pad_bottom = pad_shape[0] // 2
pad_top = pad_shape[0]  pad_bottom
pad_right = pad_shape[1] // 2
pad_left = pad_shape[1]  pad_right
padded = np.pad(
x,
((0, 0), (0, 0), (pad_top, pad_bottom), (pad_left, pad_right)),
mode="constant",
constant_values=np.nan,
)
y = pool(padded, x_shape, kernel_shape, strides, out_shape, pad_shape, "MAX")
expect(node, inputs=[x], outputs=[y], name="test_maxpool_2d_same_lower")
_maxpool_2d_pads
"""
input_shape: [1, 3, 28, 28]
output_shape: [1, 3, 30, 30]
pad_shape: [4, 4] > [2, 2, 2, 2] by axis
"""
node = onnx.helper.make_node(
"MaxPool",
inputs=["x"],
outputs=["y"],
kernel_shape=[3, 3],
pads=[2, 2, 2, 2],
)
x = np.random.randn(1, 3, 28, 28).astype(np.float32)
x_shape = np.shape(x)
kernel_shape = (3, 3)
strides = (1, 1)
pad_bottom = pad_top = pad_right = pad_left = 2
pad_shape = [pad_top + pad_bottom, pad_left + pad_right]
out_shape = get_output_shape(
"VALID", np.add(x_shape[2:], pad_shape), kernel_shape, strides
)
padded = np.pad(
x,
((0, 0), (0, 0), (pad_top, pad_bottom), (pad_left, pad_right)),
mode="constant",
constant_values=np.nan,
)
y = pool(padded, x_shape, kernel_shape, strides, out_shape, pad_shape, "MAX")
expect(node, inputs=[x], outputs=[y], name="test_maxpool_2d_pads")
_maxpool_2d_strides
"""
input_shape: [1, 3, 32, 32]
output_shape: [1, 3, 10, 10]
"""
node = onnx.helper.make_node(
"MaxPool", inputs=["x"], outputs=["y"], kernel_shape=[5, 5], strides=[3, 3]
)
x = np.random.randn(1, 3, 32, 32).astype(np.float32)
x_shape = np.shape(x)
kernel_shape = (5, 5)
strides = (3, 3)
out_shape = get_output_shape("VALID", x_shape[2:], kernel_shape, strides)
padded = x
y = pool(padded, x_shape, kernel_shape, strides, out_shape, (0, 0), "MAX")
expect(node, inputs=[x], outputs=[y], name="test_maxpool_2d_strides")
_maxpool_2d_ceil
"""
input_shape: [1, 1, 4, 4]
output_shape: [1, 1, 2, 2]
"""
node = onnx.helper.make_node(
"MaxPool",
inputs=["x"],
outputs=["y"],
kernel_shape=[3, 3],
strides=[2, 2],
ceil_mode=True,
)
x = np.array(
[
[
[
[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16],
]
]
]
).astype(np.float32)
y = np.array([[[[11, 12], [15, 16]]]]).astype(np.float32)
expect(node, inputs=[x], outputs=[y], name="test_maxpool_2d_ceil")
_maxpool_2d_dilations
"""
input_shape: [1, 1, 4, 4]
output_shape: [1, 1, 2, 2]
"""
node = onnx.helper.make_node(
"MaxPool",
inputs=["x"],
outputs=["y"],
kernel_shape=[2, 2],
strides=[1, 1],
dilations=[2, 2],
)
x = np.array(
[
[
[
[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16],
]
]
]
).astype(np.float32)
y = np.array([[[[11, 12], [15, 16]]]]).astype(np.float32)
expect(node, inputs=[x], outputs=[y], name="test_maxpool_2d_dilations")
Differences
0  0  MaxPool consumes an input tensor X and applies max pooling across  MaxPool consumes an input tensor X and applies max pooling across 
1  1  the tensor according to kernel sizes, stride sizes, and pad lengths.  the tensor according to kernel sizes, stride sizes, and pad lengths. 
2  2  max pooling consisting of computing the max on all values of a  max pooling consisting of computing the max on all values of a 
3  3  subset of the input tensor according to the kernel size and downsampling the  subset of the input tensor according to the kernel size and downsampling the 
4  4  data into the output tensor Y for further processing. The output spatial shape will be following:  data into the output tensor Y for further processing. The output spatial shape will be following: 
5  5  ::  :: 
6  6 


7  7  output_spatial_shape[i] = floor((input_spatial_shape[i] + pad_shape[i]  ((kernel_spatial_shape[i]  1) * dilations[i] + 1)) / strides_spatial_shape[i] + 1)  output_spatial_shape[i] = floor((input_spatial_shape[i] + pad_shape[i]  ((kernel_spatial_shape[i]  1) * dilations[i] + 1)) / strides_spatial_shape[i] + 1) 
8  8 


9  9  or  or 
10  10  ::  :: 
11  11 


12  12  output_spatial_shape[i] = ceil((input_spatial_shape[i] + pad_shape[i]  ((kernel_spatial_shape[i]  1) * dilations[i] + 1)) / strides_spatial_shape[i] + 1)  output_spatial_shape[i] = ceil((input_spatial_shape[i] + pad_shape[i]  ((kernel_spatial_shape[i]  1) * dilations[i] + 1)) / strides_spatial_shape[i] + 1) 
13  13 


14  14  if ceil_mode is enabled  if ceil_mode is enabled 
15  15 


16  16  ::  :: 
17  17 


18  18  * pad_shape[i] is sum of pads along axis i  * pad_shape[i] is sum of pads along axis i 
19  19 


20  20  auto_pad is a DEPRECATED attribute. If you are using them currently, the output spatial shape will be following:  auto_pad is a DEPRECATED attribute. If you are using them currently, the output spatial shape will be following: 
21  21  ::  :: 
22  22 


23  23  VALID: output_spatial_shape[i] = ceil((input_spatial_shape[i]  ((kernel_spatial_shape[i]  1) * dilations[i] + 1) + 1) / strides_spatial_shape[i])  VALID: output_spatial_shape[i] = ceil((input_spatial_shape[i]  ((kernel_spatial_shape[i]  1) * dilations[i] + 1) + 1) / strides_spatial_shape[i]) 
24  24  SAME_UPPER or SAME_LOWER: output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides_spatial_shape[i])  SAME_UPPER or SAME_LOWER: output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides_spatial_shape[i]) 
25  25 


26  26  And pad shape will be following if SAME_UPPER or SAME_LOWER:  And pad shape will be following if SAME_UPPER or SAME_LOWER: 
27  27  ::  :: 
28  28 


29  29  pad_shape[i] = (output_spatial_shape[i]  1) * strides_spatial_shape[i] + ((kernel_spatial_shape[i]  1) * dilations[i] + 1)  input_spatial_shape[i]  pad_shape[i] = (output_spatial_shape[i]  1) * strides_spatial_shape[i] + ((kernel_spatial_shape[i]  1) * dilations[i] + 1)  input_spatial_shape[i] 
30  30 


31  31  The output of each pooling window is maximum number of elements exclude pad.  The output of each pooling window is maximum number of elements exclude pad. 
32  32 


33  33  **Attributes**  **Attributes** 
34  34 


35  35  * **auto_pad**:  * **auto_pad**: 
36  36  auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID.  auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. 
37  37  Where default value is NOTSET, which means explicit padding is used.  Where default value is NOTSET, which means explicit padding is used. 
38  38  SAME_UPPER or SAME_LOWER mean pad the input so that the output 

39  spatial size match the input.In case of odd number add the extra  
39  = ceil(input_shape[i] / strides[i]) for each axis i. The padding  
40  is split between the two sides equally or almost equally (depending  
41  on whether it is even or odd). In case the padding is an odd number,  
40  42  padding at the end for SAME_UPPER and at the beginning for 

41  43  SAME_LOWER. VALID mean no padding. Default value is 'NOTSET'. 

42  44  * **ceil_mode**:  * **ceil_mode**: 
43  45  Whether to use ceil or floor (default) to compute the output shape. Default value is 0.  Whether to use ceil or floor (default) to compute the output shape. Default value is 0. 
44  46  * **dilations**:  * **dilations**: 
45  47  Dilation value along each spatial axis of filter. If not present,  Dilation value along each spatial axis of filter. If not present, 
46  48  the dilation defaults to 1 along each spatial axis.  the dilation defaults to 1 along each spatial axis. 
47  49  * **kernel_shape** (required):  * **kernel_shape** (required): 
48  50  The size of the kernel along each axis.  The size of the kernel along each axis. 
49  51  * **pads**:  * **pads**: 
50  52  Padding for the beginning and ending along each spatial axis, it can  Padding for the beginning and ending along each spatial axis, it can 
51  53  take any value greater than or equal to 0. The value represent the  take any value greater than or equal to 0. The value represent the 
52  54  number of pixels added to the beginning and end part of the  number of pixels added to the beginning and end part of the 
53  55  corresponding axis. pads format should be as follow [x1_begin,  corresponding axis. pads format should be as follow [x1_begin, 
54  56  x2_begin...x1_end, x2_end,...], where xi_begin the number of pixels  x2_begin...x1_end, x2_end,...], where xi_begin the number of pixels 
55  57  added at the beginning of axis i and xi_end, the number of pixels  added at the beginning of axis i and xi_end, the number of pixels 
56  58  added at the end of axis i. This attribute cannot be used  added at the end of axis i. This attribute cannot be used 
57  59  simultaneously with auto_pad attribute. If not present, the padding  simultaneously with auto_pad attribute. If not present, the padding 
58  60  defaults to 0 along start and end of each spatial axis.  defaults to 0 along start and end of each spatial axis. 
59  61  * **storage_order**:  * **storage_order**: 
60  62  The storage order of the tensor. 0 is row major, and 1 is column  The storage order of the tensor. 0 is row major, and 1 is column 
61  63  major. Default value is 0. 

64  into a single integer value for producing the second output. Default value is 0.  
62  65  * **strides**:  * **strides**: 
63  66  Stride along each spatial axis. If not present, the stride defaults  Stride along each spatial axis. If not present, the stride defaults 
64  67  to 1 along each spatial axis.  to 1 along each spatial axis. 
65  68 


66  69  **Inputs**  **Inputs** 
67  70 


68  71  * **X** (heterogeneous)  **T**:  * **X** (heterogeneous)  **T**: 
69  72  Input data tensor from the previous operator; dimensions for image  Input data tensor from the previous operator; dimensions for image 
70  73  case are (N x C x H x W), where N is the batch size, C is the number  case are (N x C x H x W), where N is the batch size, C is the number 
71  74  of channels, and H and W are the height and the width of the data.  of channels, and H and W are the height and the width of the data. 
72  75  For non image case, the dimensions are in the form of (N x C x D1 x  For non image case, the dimensions are in the form of (N x C x D1 x 
73  76  D2 ... Dn), where N is the batch size. Optionally, if dimension  D2 ... Dn), where N is the batch size. Optionally, if dimension 
74  77  denotation is in effect, the operation expects the input data tensor  denotation is in effect, the operation expects the input data tensor 
75  78  to arrive with the dimension denotation of [DATA_BATCH,  to arrive with the dimension denotation of [DATA_BATCH, 
76  79  DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE ...].  DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE ...]. 
77  80 


78  81  **Outputs**  **Outputs** 
79  82 


80  83  Between 1 and 2 outputs.  Between 1 and 2 outputs. 
81  84 


82  85  * **Y** (heterogeneous)  **T**:  * **Y** (heterogeneous)  **T**: 
83  86  Output data tensor from average or max pooling across the input  Output data tensor from average or max pooling across the input 
84  87  tensor. Dimensions will vary based on various kernel, stride, and  tensor. Dimensions will vary based on various kernel, stride, and 
85  88  pad sizes. Floor value of the dimension is used  pad sizes. Floor value of the dimension is used 
86  89  * **Indices** (optional, heterogeneous)  **I**:  * **Indices** (optional, heterogeneous)  **I**: 
87  90  Indices tensor from max pooling across the input tensor. The  Indices tensor from max pooling across the input tensor. The 
88  91  dimensions of indices are the same as output tensor. The values in  dimensions of indices are the same as output tensor. The values in 
89  92  indices of are the indices of the selected values during pooling.  indices of are the indices of the selected values during pooling. 
90  93  The indices are computed as flatten 1D tensor, and the indices do  The indices are computed as flatten 1D tensor, and the indices do 
91  94  not consider padding. So the values in indices are in [0, N x C x D1  not consider padding. So the values in indices are in [0, N x C x D1 
92  95  x ... x Dn).  x ... x Dn). 
93  96 


94  97  **Type Constraints**  **Type Constraints** 
95  98 


96  99  * **T** in (  * **T** in ( 
97  100  tensor(double),  tensor(double), 
98  101  tensor(float),  tensor(float), 
99  102  tensor(float16) 

103  tensor(int8),  
104  tensor(uint8)  
100  105  ):  ): 
101  106  Constrain input and output types to float tensors. 

102  107  * **I** in (  * **I** in ( 
103  108  tensor(int64)  tensor(int64) 
104  109  ):  ): 
105  110  Constrain index tensor to int64  Constrain index tensor to int64 
MaxPool  11#
Version
name: MaxPool (GitHub)
domain: main
since_version: 11
function: False
support_level: SupportType.COMMON
shape inference: True
This version of the operator has been available since version 11.
Summary
MaxPool consumes an input tensor X and applies max pooling across the tensor according to kernel sizes, stride sizes, and pad lengths. max pooling consisting of computing the max on all values of a subset of the input tensor according to the kernel size and downsampling the data into the output tensor Y for further processing. The output spatial shape will be following:
output_spatial_shape[i] = floor((input_spatial_shape[i] + pad_shape[i]  ((kernel_spatial_shape[i]  1) * dilations[i] + 1)) / strides_spatial_shape[i] + 1)
or#
output_spatial_shape[i] = ceil((input_spatial_shape[i] + pad_shape[i]  ((kernel_spatial_shape[i]  1) * dilations[i] + 1)) / strides_spatial_shape[i] + 1)
if ceil_mode is enabled
* pad_shape[i] is sum of pads along axis i
auto_pad is a DEPRECATED attribute. If you are using them currently, the output spatial shape will be following:
VALID: output_spatial_shape[i] = ceil((input_spatial_shape[i]  ((kernel_spatial_shape[i]  1) * dilations[i] + 1) + 1) / strides_spatial_shape[i])
SAME_UPPER or SAME_LOWER: output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides_spatial_shape[i])
And pad shape will be following if SAME_UPPER or SAME_LOWER:
pad_shape[i] = (output_spatial_shape[i]  1) * strides_spatial_shape[i] + ((kernel_spatial_shape[i]  1) * dilations[i] + 1)  input_spatial_shape[i]
The output of each pooling window is maximum number of elements exclude pad.
Attributes
auto_pad: auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that the output spatial size match the input.In case of odd number add the extra padding at the end for SAME_UPPER and at the beginning for SAME_LOWER. VALID mean no padding. Default value is
'NOTSET'
.ceil_mode: Whether to use ceil or floor (default) to compute the output shape. Default value is
0
.dilations: Dilation value along each spatial axis of filter. If not present, the dilation defaults to 1 along each spatial axis.
kernel_shape (required): The size of the kernel along each axis.
pads: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. pads format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.
storage_order: The storage order of the tensor. 0 is row major, and 1 is column major. Default value is
0
.strides: Stride along each spatial axis. If not present, the stride defaults to 1 along each spatial axis.
Inputs
X (heterogeneous)  T: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size. Optionally, if dimension denotation is in effect, the operation expects the input data tensor to arrive with the dimension denotation of [DATA_BATCH, DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE …].
Outputs
Between 1 and 2 outputs.
Y (heterogeneous)  T: Output data tensor from average or max pooling across the input tensor. Dimensions will vary based on various kernel, stride, and pad sizes. Floor value of the dimension is used
Indices (optional, heterogeneous)  I: Indices tensor from max pooling across the input tensor. The dimensions of indices are the same as output tensor. The values in indices of are the indices of the selected values during pooling. The indices are computed as flatten 1D tensor, and the indices do not consider padding. So the values in indices are in [0, N x C x D1 x … x Dn).
Type Constraints
T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.
I in ( tensor(int64) ): Constrain index tensor to int64
Differences
0  0  MaxPool consumes an input tensor X and applies max pooling across  MaxPool consumes an input tensor X and applies max pooling across 
1  1  the tensor according to kernel sizes, stride sizes, and pad lengths.  the tensor according to kernel sizes, stride sizes, and pad lengths. 
2  2  max pooling consisting of computing the max on all values of a  max pooling consisting of computing the max on all values of a 
3  3  subset of the input tensor according to the kernel size and downsampling the  subset of the input tensor according to the kernel size and downsampling the 
4  4  data into the output tensor Y for further processing. The output spatial shape will be following:  data into the output tensor Y for further processing. The output spatial shape will be following: 
5  5  ::  :: 
6  6 


7  7  output_spatial_shape[i] = floor((input_spatial_shape[i] + pad_shape[i]  ((kernel_spatial_shape[i]  1) * dilations[i] + 1)) / strides_spatial_shape[i] + 1)  output_spatial_shape[i] = floor((input_spatial_shape[i] + pad_shape[i]  ((kernel_spatial_shape[i]  1) * dilations[i] + 1)) / strides_spatial_shape[i] + 1) 
8  8 


9  9  or  or 
10  10  ::  :: 
11  11 


12  12  output_spatial_shape[i] = ceil((input_spatial_shape[i] + pad_shape[i]  ((kernel_spatial_shape[i]  1) * dilations[i] + 1)) / strides_spatial_shape[i] + 1)  output_spatial_shape[i] = ceil((input_spatial_shape[i] + pad_shape[i]  ((kernel_spatial_shape[i]  1) * dilations[i] + 1)) / strides_spatial_shape[i] + 1) 
13  13 


14  14  if ceil_mode is enabled  if ceil_mode is enabled 
15  15 


16  16  ::  :: 
17  17 


18  18  * pad_shape[i] is sum of pads along axis i  * pad_shape[i] is sum of pads along axis i 
19  19 


20  20  auto_pad is a DEPRECATED attribute. If you are using them currently, the output spatial shape will be following:  auto_pad is a DEPRECATED attribute. If you are using them currently, the output spatial shape will be following: 
21  21  ::  :: 
22  22 


23  23  VALID: output_spatial_shape[i] = ceil((input_spatial_shape[i]  ((kernel_spatial_shape[i]  1) * dilations[i] + 1) + 1) / strides_spatial_shape[i])  VALID: output_spatial_shape[i] = ceil((input_spatial_shape[i]  ((kernel_spatial_shape[i]  1) * dilations[i] + 1) + 1) / strides_spatial_shape[i]) 
24  24  SAME_UPPER or SAME_LOWER: output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides_spatial_shape[i])  SAME_UPPER or SAME_LOWER: output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides_spatial_shape[i]) 
25  25 


26  26  And pad shape will be following if SAME_UPPER or SAME_LOWER:  And pad shape will be following if SAME_UPPER or SAME_LOWER: 
27  27  ::  :: 
28  28 


29  29  pad_shape[i] = (output_spatial_shape[i]  1) * strides_spatial_shape[i] + ((kernel_spatial_shape[i]  1) * dilations[i] + 1)  input_spatial_shape[i]  pad_shape[i] = (output_spatial_shape[i]  1) * strides_spatial_shape[i] + ((kernel_spatial_shape[i]  1) * dilations[i] + 1)  input_spatial_shape[i] 
30  30 


31  31  The output of each pooling window is maximum number of elements exclude pad.  The output of each pooling window is maximum number of elements exclude pad. 
32  32 


33  33  **Attributes**  **Attributes** 
34  34 


35  35  * **auto_pad**:  * **auto_pad**: 
36  36  auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID.  auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. 
37  37  Where default value is NOTSET, which means explicit padding is used.  Where default value is NOTSET, which means explicit padding is used. 
38  38  SAME_UPPER or SAME_LOWER mean pad the input so that the output  SAME_UPPER or SAME_LOWER mean pad the input so that the output 
39  39  spatial size match the input.In case of odd number add the extra  spatial size match the input.In case of odd number add the extra 
40  40  padding at the end for SAME_UPPER and at the beginning for  padding at the end for SAME_UPPER and at the beginning for 
41  41  SAME_LOWER. VALID mean no padding. Default value is 'NOTSET'.  SAME_LOWER. VALID mean no padding. Default value is 'NOTSET'. 
42  42  * **ceil_mode**:  * **ceil_mode**: 
43  43  Whether to use ceil or floor (default) to compute the output shape. Default value is 0.  Whether to use ceil or floor (default) to compute the output shape. Default value is 0. 
44  44  * **dilations**:  * **dilations**: 
45  45  Dilation value along each spatial axis of filter. 

46  the dilation defaults to 1 along each spatial axis.  
46  47  * **kernel_shape** (required):  * **kernel_shape** (required): 
47  48  The size of the kernel along each axis.  The size of the kernel along each axis. 
48  49  * **pads**:  * **pads**: 
49  50  Padding for the beginning and ending along each spatial axis, it can  Padding for the beginning and ending along each spatial axis, it can 
50  51  take any value greater than or equal to 0. The value represent the  take any value greater than or equal to 0. The value represent the 
51  52  number of pixels added to the beginning and end part of the  number of pixels added to the beginning and end part of the 
52  53  corresponding axis. pads format should be as follow [x1_begin,  corresponding axis. pads format should be as follow [x1_begin, 
53  54  x2_begin...x1_end, x2_end,...], where xi_begin the number of pixels  x2_begin...x1_end, x2_end,...], where xi_begin the number of pixels 
54  55  added at the beginning of axis i and xi_end, the number of pixels  added at the beginning of axis i and xi_end, the number of pixels 
55  56  added at the end of axis i. This attribute cannot be used  added at the end of axis i. This attribute cannot be used 
56  57  simultaneously with auto_pad attribute. If not present, the padding  simultaneously with auto_pad attribute. If not present, the padding 
57  58  defaults to 0 along start and end of each spatial axis.  defaults to 0 along start and end of each spatial axis. 
58  59  * **storage_order**:  * **storage_order**: 
59  60  The storage order of the tensor. 0 is row major, and 1 is column  The storage order of the tensor. 0 is row major, and 1 is column 
60  61  major. Default value is 0.  major. Default value is 0. 
61  62  * **strides**:  * **strides**: 
62  63  Stride along each spatial axis. 

64  to 1 along each spatial axis.  
63  65 


64  66  **Inputs**  **Inputs** 
65  67 


66  68  * **X** (heterogeneous)  **T**:  * **X** (heterogeneous)  **T**: 
67  69  Input data tensor from the previous operator; dimensions for image  Input data tensor from the previous operator; dimensions for image 
68  70  case are (N x C x H x W), where N is the batch size, C is the number  case are (N x C x H x W), where N is the batch size, C is the number 
69  71  of channels, and H and W are the height and the width of the data.  of channels, and H and W are the height and the width of the data. 
70  72  For non image case, the dimensions are in the form of (N x C x D1 x  For non image case, the dimensions are in the form of (N x C x D1 x 
71  73  D2 ... Dn), where N is the batch size. Optionally, if dimension  D2 ... Dn), where N is the batch size. Optionally, if dimension 
72  74  denotation is in effect, the operation expects the input data tensor  denotation is in effect, the operation expects the input data tensor 
73  75  to arrive with the dimension denotation of [DATA_BATCH,  to arrive with the dimension denotation of [DATA_BATCH, 
74  76  DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE ...].  DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE ...]. 
75  77 


76  78  **Outputs**  **Outputs** 
77  79 


78  80  Between 1 and 2 outputs.  Between 1 and 2 outputs. 
79  81 


80  82  * **Y** (heterogeneous)  **T**:  * **Y** (heterogeneous)  **T**: 
81  83  Output data tensor from average or max pooling across the input  Output data tensor from average or max pooling across the input 
82  84  tensor. Dimensions will vary based on various kernel, stride, and  tensor. Dimensions will vary based on various kernel, stride, and 
83  85  pad sizes. Floor value of the dimension is used  pad sizes. Floor value of the dimension is used 
84  86  * **Indices** (optional, heterogeneous)  **I**:  * **Indices** (optional, heterogeneous)  **I**: 
85  87  Indices tensor from max pooling across the input tensor. The  Indices tensor from max pooling across the input tensor. The 
86  88  dimensions of indices are the same as output tensor. The values in  dimensions of indices are the same as output tensor. The values in 
87  89  indices of are the indices of the selected values during pooling.  indices of are the indices of the selected values during pooling. 
88  90  The indices are computed as flatten 1D tensor, and the indices do  The indices are computed as flatten 1D tensor, and the indices do 
89  91  not consider padding. So the values in indices are in [0, N x C x D1  not consider padding. So the values in indices are in [0, N x C x D1 
90  92  x ... x Dn).  x ... x Dn). 
91  93 


92  94  **Type Constraints**  **Type Constraints** 
93  95 


94  96  * **T** in (  * **T** in ( 
95  97  tensor(double),  tensor(double), 
96  98  tensor(float),  tensor(float), 
97  99  tensor(float16)  tensor(float16) 
98  100  ):  ): 
99  101  Constrain input and output types to float tensors.  Constrain input and output types to float tensors. 
100  102  * **I** in (  * **I** in ( 
101  103  tensor(int64)  tensor(int64) 
102  104  ):  ): 
103  105  Constrain index tensor to int64  Constrain index tensor to int64 
MaxPool  10#
Version
name: MaxPool (GitHub)
domain: main
since_version: 10
function: False
support_level: SupportType.COMMON
shape inference: True
This version of the operator has been available since version 10.
Summary
MaxPool consumes an input tensor X and applies max pooling across the tensor according to kernel sizes, stride sizes, and pad lengths. max pooling consisting of computing the max on all values of a subset of the input tensor according to the kernel size and downsampling the data into the output tensor Y for further processing. The output spatial shape will be following:
output_spatial_shape[i] = floor((input_spatial_shape[i] + pad_shape[i]  ((kernel_spatial_shape[i]  1) * dilations[i] + 1)) / strides_spatial_shape[i] + 1)
or#
output_spatial_shape[i] = ceil((input_spatial_shape[i] + pad_shape[i]  ((kernel_spatial_shape[i]  1) * dilations[i] + 1)) / strides_spatial_shape[i] + 1)
if ceil_mode is enabled
* pad_shape[i] is sum of pads along axis i
auto_pad is a DEPRECATED attribute. If you are using them currently, the output spatial shape will be following:
VALID: output_spatial_shape[i] = ceil((input_spatial_shape[i]  ((kernel_spatial_shape[i]  1) * dilations[i] + 1) + 1) / strides_spatial_shape[i])
SAME_UPPER or SAME_LOWER: output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides_spatial_shape[i])
And pad shape will be following if SAME_UPPER or SAME_LOWER:
pad_shape[i] = (output_spatial_shape[i]  1) * strides_spatial_shape[i] + ((kernel_spatial_shape[i]  1) * dilations[i] + 1)  input_spatial_shape[i]
The output of each pooling window is maximum number of elements exclude pad.
Attributes
auto_pad: auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that the output spatial size match the input.In case of odd number add the extra padding at the end for SAME_UPPER and at the beginning for SAME_LOWER. VALID mean no padding. Default value is
'NOTSET'
.ceil_mode: Whether to use ceil or floor (default) to compute the output shape. Default value is
0
.dilations: Dilation value along each spatial axis of filter.
kernel_shape (required): The size of the kernel along each axis.
pads: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. pads format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.
storage_order: The storage order of the tensor. 0 is row major, and 1 is column major. Default value is
0
.strides: Stride along each spatial axis.
Inputs
X (heterogeneous)  T: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size. Optionally, if dimension denotation is in effect, the operation expects the input data tensor to arrive with the dimension denotation of [DATA_BATCH, DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE …].
Outputs
Between 1 and 2 outputs.
Y (heterogeneous)  T: Output data tensor from average or max pooling across the input tensor. Dimensions will vary based on various kernel, stride, and pad sizes. Floor value of the dimension is used
Indices (optional, heterogeneous)  I: Indices tensor from max pooling across the input tensor. The dimensions of indices are the same as output tensor. The values in indices of are the indices of the selected values during pooling. The indices are computed as flatten 1D tensor, and the indices do not consider padding. So the values in indices are in [0, N x C x D1 x … x Dn).
Type Constraints
T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.
I in ( tensor(int64) ): Constrain index tensor to int64
Differences
0  0  MaxPool consumes an input tensor X and applies max pooling across  MaxPool consumes an input tensor X and applies max pooling across 
1  1  the tensor according to kernel sizes, stride sizes, and pad lengths.  the tensor according to kernel sizes, stride sizes, and pad lengths. 
2  2  max pooling consisting of computing the max on all values of a  max pooling consisting of computing the max on all values of a 
3  3  subset of the input tensor according to the kernel size and downsampling the  subset of the input tensor according to the kernel size and downsampling the 
4  4  data into the output tensor Y for further processing. The output spatial shape will be following:  data into the output tensor Y for further processing. The output spatial shape will be following: 
5  5  ::  :: 
6  6 


7  7  output_spatial_shape[i] = floor((input_spatial_shape[i] + pad_shape[i]  kernel_spatial_shape[i]) / strides_spatial_shape[i] + 1) 

8  8 


9  or  
10  ::  
11 
 
12  output_spatial_shape[i] = ceil((input_spatial_shape[i] + pad_shape[i]  ((kernel_spatial_shape[i]  1) * dilations[i] + 1)) / strides_spatial_shape[i] + 1)  
13 
 
14  if ceil_mode is enabled  
15 
 
16  ::  
17 
 
9  18  * pad_shape[i] is sum of pads along axis i  * pad_shape[i] is sum of pads along axis i 
10  19 


11  20  auto_pad is a DEPRECATED attribute. If you are using them currently, the output spatial shape will be following:  auto_pad is a DEPRECATED attribute. If you are using them currently, the output spatial shape will be following: 
12  21  ::  :: 
13  22 


14  23  VALID: output_spatial_shape[i] = ceil((input_spatial_shape[i]  kernel_spatial_shape[i] + 1) / strides_spatial_shape[i]) 

15  24  SAME_UPPER or SAME_LOWER: output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides_spatial_shape[i])  SAME_UPPER or SAME_LOWER: output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides_spatial_shape[i]) 
16  25 


17  26  And pad shape will be following if SAME_UPPER or SAME_LOWER:  And pad shape will be following if SAME_UPPER or SAME_LOWER: 
18  27  ::  :: 
19  28 


20  29  pad_shape[i] = (output_spatial_shape[i]  1) * strides_spatial_shape[i] + kernel_spatial_shape[i]  input_spatial_shape[i] 

21  30 


22  31  The output of each pooling window is maximum number of elements exclude pad.  The output of each pooling window is maximum number of elements exclude pad. 
23  32 


24  33  **Attributes**  **Attributes** 
25  34 


26  35  * **auto_pad**:  * **auto_pad**: 
27  36  auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID.  auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. 
28  37  Where default value is NOTSET, which means explicit padding is used.  Where default value is NOTSET, which means explicit padding is used. 
29  38  SAME_UPPER or SAME_LOWER mean pad the input so that the output  SAME_UPPER or SAME_LOWER mean pad the input so that the output 
30  39  spatial size match the input.In case of odd number add the extra  spatial size match the input.In case of odd number add the extra 
31  40  padding at the end for SAME_UPPER and at the beginning for  padding at the end for SAME_UPPER and at the beginning for 
32  41  SAME_LOWER. VALID mean no padding. Default value is 'NOTSET'.  SAME_LOWER. VALID mean no padding. Default value is 'NOTSET'. 
42  * **ceil_mode**:  
43  Whether to use ceil or floor (default) to compute the output shape. Default value is 0.  
44  * **dilations**:  
45  Dilation value along each spatial axis of filter.  
33  46  * **kernel_shape** (required):  * **kernel_shape** (required): 
34  47  The size of the kernel along each axis.  The size of the kernel along each axis. 
35  48  * **pads**:  * **pads**: 
36  49  Padding for the beginning and ending along each spatial axis, it can  Padding for the beginning and ending along each spatial axis, it can 
37  50  take any value greater than or equal to 0. The value represent the  take any value greater than or equal to 0. The value represent the 
38  51  number of pixels added to the beginning and end part of the  number of pixels added to the beginning and end part of the 
39  52  corresponding axis. pads format should be as follow [x1_begin,  corresponding axis. pads format should be as follow [x1_begin, 
40  53  x2_begin...x1_end, x2_end,...], where xi_begin the number of pixels  x2_begin...x1_end, x2_end,...], where xi_begin the number of pixels 
41  54  added at the beginning of axis i and xi_end, the number of pixels  added at the beginning of axis i and xi_end, the number of pixels 
42  55  added at the end of axis i. This attribute cannot be used  added at the end of axis i. This attribute cannot be used 
43  56  simultaneously with auto_pad attribute. If not present, the padding  simultaneously with auto_pad attribute. If not present, the padding 
44  57  defaults to 0 along start and end of each spatial axis.  defaults to 0 along start and end of each spatial axis. 
45  58  * **storage_order**:  * **storage_order**: 
46  59  The storage order of the tensor. 0 is row major, and 1 is column  The storage order of the tensor. 0 is row major, and 1 is column 
47  60  major. Default value is 0.  major. Default value is 0. 
48  61  * **strides**:  * **strides**: 
49  62  Stride along each spatial axis.  Stride along each spatial axis. 
50  63 


51  64  **Inputs**  **Inputs** 
52  65 


53  66  * **X** (heterogeneous)  **T**:  * **X** (heterogeneous)  **T**: 
54  67  Input data tensor from the previous operator; dimensions for image  Input data tensor from the previous operator; dimensions for image 
55  68  case are (N x C x H x W), where N is the batch size, C is the number  case are (N x C x H x W), where N is the batch size, C is the number 
56  69  of channels, and H and W are the height and the width of the data.  of channels, and H and W are the height and the width of the data. 
57  70  For non image case, the dimensions are in the form of (N x C x D1 x  For non image case, the dimensions are in the form of (N x C x D1 x 
58  71  D2 ... Dn), where N is the batch size. Optionally, if dimension  D2 ... Dn), where N is the batch size. Optionally, if dimension 
59  72  denotation is in effect, the operation expects the input data tensor  denotation is in effect, the operation expects the input data tensor 
60  73  to arrive with the dimension denotation of [DATA_BATCH,  to arrive with the dimension denotation of [DATA_BATCH, 
61  74  DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE ...].  DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE ...]. 
62  75 


63  76  **Outputs**  **Outputs** 
64  77 


65  78  Between 1 and 2 outputs.  Between 1 and 2 outputs. 
66  79 


67  80  * **Y** (heterogeneous)  **T**:  * **Y** (heterogeneous)  **T**: 
68  81  Output data tensor from average or max pooling across the input  Output data tensor from average or max pooling across the input 
69  82  tensor. Dimensions will vary based on various kernel, stride, and  tensor. Dimensions will vary based on various kernel, stride, and 
70  83  pad sizes. Floor value of the dimension is used  pad sizes. Floor value of the dimension is used 
71  84  * **Indices** (optional, heterogeneous)  **I**:  * **Indices** (optional, heterogeneous)  **I**: 
72  85  Indices tensor from max pooling across the input tensor. The  Indices tensor from max pooling across the input tensor. The 
73  86  dimensions of indices are the same as output tensor. The values in  dimensions of indices are the same as output tensor. The values in 
74  87  indices of are the indices of the selected values during pooling.  indices of are the indices of the selected values during pooling. 
75  88  The indices are computed as flatten 1D tensor, and the indices do  The indices are computed as flatten 1D tensor, and the indices do 
76  89  not consider padding. So the values in indices are in [0, N x C x D1  not consider padding. So the values in indices are in [0, N x C x D1 
77  90  x ... x Dn).  x ... x Dn). 
78  91 


79  92  **Type Constraints**  **Type Constraints** 
80  93 


81  94  * **T** in (  * **T** in ( 
82  95  tensor(double),  tensor(double), 
83  96  tensor(float),  tensor(float), 
84  97  tensor(float16)  tensor(float16) 
85  98  ):  ): 
86  99  Constrain input and output types to float tensors.  Constrain input and output types to float tensors. 
87  100  * **I** in (  * **I** in ( 
88  101  tensor(int64)  tensor(int64) 
89  102  ):  ): 
90  103  Constrain index tensor to int64  Constrain index tensor to int64 
MaxPool  8#
Version
name: MaxPool (GitHub)
domain: main
since_version: 8
function: False
support_level: SupportType.COMMON
shape inference: True
This version of the operator has been available since version 8.
Summary
MaxPool consumes an input tensor X and applies max pooling across the tensor according to kernel sizes, stride sizes, and pad lengths. max pooling consisting of computing the max on all values of a subset of the input tensor according to the kernel size and downsampling the data into the output tensor Y for further processing. The output spatial shape will be following:
output_spatial_shape[i] = floor((input_spatial_shape[i] + pad_shape[i]  kernel_spatial_shape[i]) / strides_spatial_shape[i] + 1)
* pad_shape[i] is sum of pads along axis i
auto_pad is a DEPRECATED attribute. If you are using them currently, the output spatial shape will be following:
VALID: output_spatial_shape[i] = ceil((input_spatial_shape[i]  kernel_spatial_shape[i] + 1) / strides_spatial_shape[i])
SAME_UPPER or SAME_LOWER: output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides_spatial_shape[i])
And pad shape will be following if SAME_UPPER or SAME_LOWER:
pad_shape[i] = (output_spatial_shape[i]  1) * strides_spatial_shape[i] + kernel_spatial_shape[i]  input_spatial_shape[i]
The output of each pooling window is maximum number of elements exclude pad.
Attributes
auto_pad: auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that the output spatial size match the input.In case of odd number add the extra padding at the end for SAME_UPPER and at the beginning for SAME_LOWER. VALID mean no padding. Default value is
'NOTSET'
.kernel_shape (required): The size of the kernel along each axis.
pads: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. pads format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.
storage_order: The storage order of the tensor. 0 is row major, and 1 is column major. Default value is
0
.strides: Stride along each spatial axis.
Inputs
X (heterogeneous)  T: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size. Optionally, if dimension denotation is in effect, the operation expects the input data tensor to arrive with the dimension denotation of [DATA_BATCH, DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE …].
Outputs
Between 1 and 2 outputs.
Y (heterogeneous)  T: Output data tensor from average or max pooling across the input tensor. Dimensions will vary based on various kernel, stride, and pad sizes. Floor value of the dimension is used
Indices (optional, heterogeneous)  I: Indices tensor from max pooling across the input tensor. The dimensions of indices are the same as output tensor. The values in indices of are the indices of the selected values during pooling. The indices are computed as flatten 1D tensor, and the indices do not consider padding. So the values in indices are in [0, N x C x D1 x … x Dn).
Type Constraints
T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.
I in ( tensor(int64) ): Constrain index tensor to int64
Differences
0  0  MaxPool consumes an input tensor X and applies max pooling across  MaxPool consumes an input tensor X and applies max pooling across 
1  1  the tensor according to kernel sizes, stride sizes, and pad lengths.  the tensor according to kernel sizes, stride sizes, and pad lengths. 
2  2  max pooling consisting of computing the max on all values of a  max pooling consisting of computing the max on all values of a 
3  3  subset of the input tensor according to the kernel size and downsampling the  subset of the input tensor according to the kernel size and downsampling the 
4  4  data into the output tensor Y for further processing. The output spatial shape will be following:  data into the output tensor Y for further processing. The output spatial shape will be following: 
5  5  ::  :: 
6  6 


7  7  output_spatial_shape[i] = floor((input_spatial_shape[i] + pad_shape[i]  kernel_spatial_shape[i]) / strides_spatial_shape[i] + 1)  output_spatial_shape[i] = floor((input_spatial_shape[i] + pad_shape[i]  kernel_spatial_shape[i]) / strides_spatial_shape[i] + 1) 
8  8 


9  9  * pad_shape[i] is sum of pads along axis i  * pad_shape[i] is sum of pads along axis i 
10  10 


11  11  auto_pad is a DEPRECATED attribute. If you are using them currently, the output spatial shape will be following:  auto_pad is a DEPRECATED attribute. If you are using them currently, the output spatial shape will be following: 
12  12  ::  :: 
13  13 


14  14  VALID: output_spatial_shape[i] = ceil((input_spatial_shape[i]  kernel_spatial_shape[i] + 1) / strides_spatial_shape[i])  VALID: output_spatial_shape[i] = ceil((input_spatial_shape[i]  kernel_spatial_shape[i] + 1) / strides_spatial_shape[i]) 
15  15  SAME_UPPER or SAME_LOWER: output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides_spatial_shape[i])  SAME_UPPER or SAME_LOWER: output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides_spatial_shape[i]) 
16  16 


17  17  And pad shape will be following if SAME_UPPER or SAME_LOWER:  And pad shape will be following if SAME_UPPER or SAME_LOWER: 
18  18  ::  :: 
19  19 


20  20  pad_shape[i] = (output_spatial_shape[i]  1) * strides_spatial_shape[i] + kernel_spatial_shape[i]  input_spatial_shape[i]  pad_shape[i] = (output_spatial_shape[i]  1) * strides_spatial_shape[i] + kernel_spatial_shape[i]  input_spatial_shape[i] 
21  21 


22  22  The output of each pooling window is maximum number of elements exclude pad.  The output of each pooling window is maximum number of elements exclude pad. 
23  23 


24  24  **Attributes**  **Attributes** 
25  25 


26  26  * **auto_pad**:  * **auto_pad**: 
27  27  auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID.  auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. 
28  28  Where default value is NOTSET, which means explicit padding is used.  Where default value is NOTSET, which means explicit padding is used. 
29  29  SAME_UPPER or SAME_LOWER mean pad the input so that the output  SAME_UPPER or SAME_LOWER mean pad the input so that the output 
30  30  spatial size match the input.In case of odd number add the extra  spatial size match the input.In case of odd number add the extra 
31  31  padding at the end for SAME_UPPER and at the beginning for  padding at the end for SAME_UPPER and at the beginning for 
32  32  SAME_LOWER. VALID mean no padding. Default value is 'NOTSET'.  SAME_LOWER. VALID mean no padding. Default value is 'NOTSET'. 
33  33  * **kernel_shape** (required):  * **kernel_shape** (required): 
34  34  The size of the kernel along each axis.  The size of the kernel along each axis. 
35  35  * **pads**:  * **pads**: 
36  36  Padding for the beginning and ending along each spatial axis, it can  Padding for the beginning and ending along each spatial axis, it can 
37  37  take any value greater than or equal to 0. The value represent the  take any value greater than or equal to 0. The value represent the 
38  38  number of pixels added to the beginning and end part of the  number of pixels added to the beginning and end part of the 
39  39  corresponding axis. pads format should be as follow [x1_begin,  corresponding axis. pads format should be as follow [x1_begin, 
40  40  x2_begin...x1_end, x2_end,...], where xi_begin the number of pixels  x2_begin...x1_end, x2_end,...], where xi_begin the number of pixels 
41  41  added at the beginning of axis i and xi_end, the number of pixels  added at the beginning of axis i and xi_end, the number of pixels 
42  42  added at the end of axis i. This attribute cannot be used  added at the end of axis i. This attribute cannot be used 
43  43  simultaneously with auto_pad attribute. If not present, the padding  simultaneously with auto_pad attribute. If not present, the padding 
44  44  defaults to 0 along start and end of each spatial axis.  defaults to 0 along start and end of each spatial axis. 
45  * **storage_order**:  
46  The storage order of the tensor. 0 is row major, and 1 is column  
47  major. Default value is 0.  
45  48  * **strides**:  * **strides**: 
46  49  Stride along each spatial axis.  Stride along each spatial axis. 
47  50 


48  51  **Inputs**  **Inputs** 
49  52 


50  53  * **X** (heterogeneous)  **T**:  * **X** (heterogeneous)  **T**: 
51  54  Input data tensor from the previous operator; dimensions for image  Input data tensor from the previous operator; dimensions for image 
52  55  case are (N x C x H x W), where N is the batch size, C is the number  case are (N x C x H x W), where N is the batch size, C is the number 
53  56  of channels, and H and W are the height and the width of the data.  of channels, and H and W are the height and the width of the data. 
54  57  For non image case, the dimensions are in the form of (N x C x D1 x  For non image case, the dimensions are in the form of (N x C x D1 x 
55  58  D2 ... Dn), where N is the batch size. Optionally, if dimension  D2 ... Dn), where N is the batch size. Optionally, if dimension 
56  59  denotation is in effect, the operation expects the input data tensor  denotation is in effect, the operation expects the input data tensor 
57  60  to arrive with the dimension denotation of [DATA_BATCH,  to arrive with the dimension denotation of [DATA_BATCH, 
58  61  DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE ...].  DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE ...]. 
59  62 


60  63  **Outputs**  **Outputs** 
61  64 


65  Between 1 and 2 outputs.  
66 
 
62  67  * **Y** (heterogeneous)  **T**:  * **Y** (heterogeneous)  **T**: 
63  68  Output data tensor from average or max pooling across the input  Output data tensor from average or max pooling across the input 
64  69  tensor. Dimensions will vary based on various kernel, stride, and  tensor. Dimensions will vary based on various kernel, stride, and 
65  70  pad sizes. Floor value of the dimension is used  pad sizes. Floor value of the dimension is used 
71  * **Indices** (optional, heterogeneous)  **I**:  
72  Indices tensor from max pooling across the input tensor. The  
73  dimensions of indices are the same as output tensor. The values in  
74  indices of are the indices of the selected values during pooling.  
75  The indices are computed as flatten 1D tensor, and the indices do  
76  not consider padding. So the values in indices are in [0, N x C x D1  
77  x ... x Dn).  
66  78 


67  79  **Type Constraints**  **Type Constraints** 
68  80 


69  81  * **T** in (  * **T** in ( 
70  82  tensor(double),  tensor(double), 
71  83  tensor(float),  tensor(float), 
72  84  tensor(float16)  tensor(float16) 
73  85  ):  ): 
74  86  Constrain input and output types to float tensors.  Constrain input and output types to float tensors. 
87  * **I** in (  
88  tensor(int64)  
89  ):  
90  Constrain index tensor to int64 
MaxPool  1#
Version
name: MaxPool (GitHub)
domain: main
since_version: 1
function: False
support_level: SupportType.COMMON
shape inference: True
This version of the operator has been available since version 1.
Summary
MaxPool consumes an input tensor X and applies max pooling across the tensor according to kernel sizes, stride sizes, and pad lengths. max pooling consisting of computing the max on all values of a subset of the input tensor according to the kernel size and downsampling the data into the output tensor Y for further processing. The output spatial shape will be following:
output_spatial_shape[i] = floor((input_spatial_shape[i] + pad_shape[i]  kernel_spatial_shape[i]) / strides_spatial_shape[i] + 1)
* pad_shape[i] is sum of pads along axis i
auto_pad is a DEPRECATED attribute. If you are using them currently, the output spatial shape will be following:
VALID: output_spatial_shape[i] = ceil((input_spatial_shape[i]  kernel_spatial_shape[i] + 1) / strides_spatial_shape[i])
SAME_UPPER or SAME_LOWER: output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides_spatial_shape[i])
And pad shape will be following if SAME_UPPER or SAME_LOWER:
pad_shape[i] = (output_spatial_shape[i]  1) * strides_spatial_shape[i] + kernel_spatial_shape[i]  input_spatial_shape[i]
The output of each pooling window is maximum number of elements exclude pad.
Attributes
auto_pad: auto_pad must be either NOTSET, SAME_UPPER, SAME_LOWER or VALID. Where default value is NOTSET, which means explicit padding is used. SAME_UPPER or SAME_LOWER mean pad the input so that the output spatial size match the input.In case of odd number add the extra padding at the end for SAME_UPPER and at the beginning for SAME_LOWER. VALID mean no padding. Default value is
'NOTSET'
.kernel_shape (required): The size of the kernel along each axis.
pads: Padding for the beginning and ending along each spatial axis, it can take any value greater than or equal to 0. The value represent the number of pixels added to the beginning and end part of the corresponding axis. pads format should be as follow [x1_begin, x2_begin…x1_end, x2_end,…], where xi_begin the number of pixels added at the beginning of axis i and xi_end, the number of pixels added at the end of axis i. This attribute cannot be used simultaneously with auto_pad attribute. If not present, the padding defaults to 0 along start and end of each spatial axis.
strides: Stride along each spatial axis.
Inputs
X (heterogeneous)  T: Input data tensor from the previous operator; dimensions for image case are (N x C x H x W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data. For non image case, the dimensions are in the form of (N x C x D1 x D2 … Dn), where N is the batch size. Optionally, if dimension denotation is in effect, the operation expects the input data tensor to arrive with the dimension denotation of [DATA_BATCH, DATA_CHANNEL, DATA_FEATURE, DATA_FEATURE …].
Outputs
Y (heterogeneous)  T: Output data tensor from average or max pooling across the input tensor. Dimensions will vary based on various kernel, stride, and pad sizes. Floor value of the dimension is used
Type Constraints
T in ( tensor(double), tensor(float), tensor(float16) ): Constrain input and output types to float tensors.