RoiAlign#

RoiAlign - 16 #

Version

name: RoiAlign (GitHub)
domain: main
since_version: 16
function: False
support_level: SupportType.COMMON
shape inference: True

This version of the operator has been available since version 16.

Summary

Region of Interest (RoI) align operation described in the [Mask R-CNN paper](https://arxiv.org/abs/1703.06870). RoiAlign consumes an input tensor X and region of interests (rois) to apply pooling across each RoI; it produces a 4-D tensor of shape (num_rois, C, output_height, output_width).

RoiAlign is proposed to avoid the misalignment by removing quantizations while converting from original image into feature map and from feature map into RoI feature; in each ROI bin, the value of the sampled locations are computed directly through bilinear interpolation.

Attributes

coordinate_transformation_mode: Allowed values are ‘half_pixel’ and ‘output_half_pixel’. Use the value ‘half_pixel’ to pixel shift the input coordinates by -0.5 (the recommended behavior). Use the value ‘output_half_pixel’ to omit the pixel shift for the input (use this for a backward-compatible behavior). Default value is 'half_pixel'.
mode: The pooling method. Two modes are supported: ‘avg’ and ‘max’. Default is ‘avg’. Default value is 'avg'.
output_height: default 1; Pooled output Y’s height. Default value is 1.
output_width: default 1; Pooled output Y’s width. Default value is 1.
sampling_ratio: Number of sampling points in the interpolation grid used to compute the output value of each pooled output bin. If > 0, then exactly sampling_ratio x sampling_ratio grid points are used. If == 0, then an adaptive number of grid points are used (computed as ceil(roi_width / output_width), and likewise for height). Default is 0. Default value is 0.
spatial_scale: Multiplicative spatial scale factor to translate ROI coordinates from their input spatial scale to the scale used when pooling, i.e., spatial scale of the input feature map X relative to the input image. E.g.; default is 1.0f. Default value is 1.0.

Inputs

X (heterogeneous) - T1: Input data tensor from the previous operator; 4-D feature map of shape (N, C, H, W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data.
rois (heterogeneous) - T1: RoIs (Regions of Interest) to pool over; rois is 2-D input of shape (num_rois, 4) given as [[x1, y1, x2, y2], …]. The RoIs’ coordinates are in the coordinate system of the input image. Each coordinate set has a 1:1 correspondence with the ‘batch_indices’ input.
batch_indices (heterogeneous) - T2: 1-D tensor of shape (num_rois,) with each element denoting the index of the corresponding image in the batch.

Outputs

Y (heterogeneous) - T1: RoI pooled output, 4-D tensor of shape (num_rois, C, output_height, output_width). The r-th batch element Y[r-1] is a pooled feature map corresponding to the r-th RoI X[r-1].

Type Constraints

T1 in ( tensor(double), tensor(float), tensor(float16) ): Constrain types to float tensors.
T2 in ( tensor(int64) ): Constrain types to int tensors.

Examples

_roialign_aligned_false

node = onnx.helper.make_node(
    "RoiAlign",
    inputs=["X", "rois", "batch_indices"],
    outputs=["Y"],
    spatial_scale=1.0,
    output_height=5,
    output_width=5,
    sampling_ratio=2,
    coordinate_transformation_mode="output_half_pixel",
)

X, batch_indices, rois = get_roi_align_input_values()
# (num_rois, C, output_height, output_width)
Y = np.array(
    [
        [
            [
                [0.4664, 0.4466, 0.3405, 0.5688, 0.6068],
                [0.3714, 0.4296, 0.3835, 0.5562, 0.3510],
                [0.2768, 0.4883, 0.5222, 0.5528, 0.4171],
                [0.4713, 0.4844, 0.6904, 0.4920, 0.8774],
                [0.6239, 0.7125, 0.6289, 0.3355, 0.3495],
            ]
        ],
        [
            [
                [0.3022, 0.4305, 0.4696, 0.3978, 0.5423],
                [0.3656, 0.7050, 0.5165, 0.3172, 0.7015],
                [0.2912, 0.5059, 0.6476, 0.6235, 0.8299],
                [0.5916, 0.7389, 0.7048, 0.8372, 0.8893],
                [0.6227, 0.6153, 0.7097, 0.6154, 0.4585],
            ]
        ],
        [
            [
                [0.2384, 0.3379, 0.3717, 0.6100, 0.7601],
                [0.3767, 0.3785, 0.7147, 0.9243, 0.9727],
                [0.5749, 0.5826, 0.5709, 0.7619, 0.8770],
                [0.5355, 0.2566, 0.2141, 0.2796, 0.3600],
                [0.4365, 0.3504, 0.2887, 0.3661, 0.2349],
            ]
        ],
    ],
    dtype=np.float32,
)

expect(
    node,
    inputs=[X, rois, batch_indices],
    outputs=[Y],
    name="test_roialign_aligned_false",
)

_roialign_aligned_true

node = onnx.helper.make_node(
    "RoiAlign",
    inputs=["X", "rois", "batch_indices"],
    outputs=["Y"],
    spatial_scale=1.0,
    output_height=5,
    output_width=5,
    sampling_ratio=2,
    coordinate_transformation_mode="half_pixel",
)

X, batch_indices, rois = get_roi_align_input_values()
# (num_rois, C, output_height, output_width)
Y = np.array(
    [
        [
            [
                [0.5178, 0.3434, 0.3229, 0.4474, 0.6344],
                [0.4031, 0.5366, 0.4428, 0.4861, 0.4023],
                [0.2512, 0.4002, 0.5155, 0.6954, 0.3465],
                [0.3350, 0.4601, 0.5881, 0.3439, 0.6849],
                [0.4932, 0.7141, 0.8217, 0.4719, 0.4039],
            ]
        ],
        [
            [
                [0.3070, 0.2187, 0.3337, 0.4880, 0.4870],
                [0.1871, 0.4914, 0.5561, 0.4192, 0.3686],
                [0.1433, 0.4608, 0.5971, 0.5310, 0.4982],
                [0.2788, 0.4386, 0.6022, 0.7000, 0.7524],
                [0.5774, 0.7024, 0.7251, 0.7338, 0.8163],
            ]
        ],
        [
            [
                [0.2393, 0.4075, 0.3379, 0.2525, 0.4743],
                [0.3671, 0.2702, 0.4105, 0.6419, 0.8308],
                [0.5556, 0.4543, 0.5564, 0.7502, 0.9300],
                [0.6626, 0.5617, 0.4813, 0.4954, 0.6663],
                [0.6636, 0.3721, 0.2056, 0.1928, 0.2478],
            ]
        ],
    ],
    dtype=np.float32,
)

expect(
    node,
    inputs=[X, rois, batch_indices],
    outputs=[Y],
    name="test_roialign_aligned_true",
)

Differences

RoiAlign - 10 #

Version

name: RoiAlign (GitHub)
domain: main
since_version: 10
function: False
support_level: SupportType.COMMON
shape inference: True

This version of the operator has been available since version 10.

Summary

Region of Interest (RoI) align operation described in the [Mask R-CNN paper](https://arxiv.org/abs/1703.06870). RoiAlign consumes an input tensor X and region of interests (rois) to apply pooling across each RoI; it produces a 4-D tensor of shape (num_rois, C, output_height, output_width).

RoiAlign is proposed to avoid the misalignment by removing quantizations while converting from original image into feature map and from feature map into RoI feature; in each ROI bin, the value of the sampled locations are computed directly through bilinear interpolation.

Attributes

mode: The pooling method. Two modes are supported: ‘avg’ and ‘max’. Default is ‘avg’. Default value is 'avg'.
output_height: default 1; Pooled output Y’s height. Default value is 1.
output_width: default 1; Pooled output Y’s width. Default value is 1.
sampling_ratio: Number of sampling points in the interpolation grid used to compute the output value of each pooled output bin. If > 0, then exactly sampling_ratio x sampling_ratio grid points are used. If == 0, then an adaptive number of grid points are used (computed as ceil(roi_width / output_width), and likewise for height). Default is 0. Default value is 0.
spatial_scale: Multiplicative spatial scale factor to translate ROI coordinates from their input spatial scale to the scale used when pooling, i.e., spatial scale of the input feature map X relative to the input image. E.g.; default is 1.0f. Default value is 1.0.

Inputs

X (heterogeneous) - T1: Input data tensor from the previous operator; 4-D feature map of shape (N, C, H, W), where N is the batch size, C is the number of channels, and H and W are the height and the width of the data.
rois (heterogeneous) - T1: RoIs (Regions of Interest) to pool over; rois is 2-D input of shape (num_rois, 4) given as [[x1, y1, x2, y2], …]. The RoIs’ coordinates are in the coordinate system of the input image. Each coordinate set has a 1:1 correspondence with the ‘batch_indices’ input.
batch_indices (heterogeneous) - T2: 1-D tensor of shape (num_rois,) with each element denoting the index of the corresponding image in the batch.

Outputs

Y (heterogeneous) - T1: RoI pooled output, 4-D tensor of shape (num_rois, C, output_height, output_width). The r-th batch element Y[r-1] is a pooled feature map corresponding to the r-th RoI X[r-1].

Type Constraints

T1 in ( tensor(double), tensor(float), tensor(float16) ): Constrain types to float tensors.
T2 in ( tensor(int64) ): Constrain types to int tensors.

Links

Contents

Information

Previous topic

Next topic

RoiAlign#

RoiAlign - 16 #

RoiAlign - 10 #

`0`	`0`	`Region of Interest (RoI) align operation described in the`	`Region of Interest (RoI) align operation described in the`
`1`	`1`	`[Mask R-CNN paper](https://arxiv.org/abs/1703.06870).`	`[Mask R-CNN paper](https://arxiv.org/abs/1703.06870).`
`2`	`2`	`RoiAlign consumes an input tensor X and region of interests (rois)`	`RoiAlign consumes an input tensor X and region of interests (rois)`
`3`	`3`	`to apply pooling across each RoI; it produces a 4-D tensor of shape`	`to apply pooling across each RoI; it produces a 4-D tensor of shape`
`4`	`4`	`(num_rois, C, output_height, output_width).`	`(num_rois, C, output_height, output_width).`
`5`	`5`
`6`	`6`	`RoiAlign is proposed to avoid the misalignment by removing`	`RoiAlign is proposed to avoid the misalignment by removing`
`7`	`7`	`quantizations while converting from original image into feature`	`quantizations while converting from original image into feature`
`8`	`8`	`map and from feature map into RoI feature; in each ROI bin,`	`map and from feature map into RoI feature; in each ROI bin,`
`9`	`9`	`the value of the sampled locations are computed directly`	`the value of the sampled locations are computed directly`
`10`	`10`	`through bilinear interpolation.`	`through bilinear interpolation.`
`11`	`11`
`12`	`12`	`Attributes`	`Attributes`
`13`	`13`
	`14`		`* coordinate_transformation_mode:`
	`15`		`Allowed values are 'half_pixel' and 'output_half_pixel'. Use the`
	`16`		`value 'half_pixel' to pixel shift the input coordinates by -0.5 (the`
	`17`		`recommended behavior). Use the value 'output_half_pixel' to omit the`
	`18`		`pixel shift for the input (use this for a backward-compatible`
	`19`		`behavior). Default value is 'half_pixel'.`
`14`	`20`	`* mode:`	`* mode:`
`15`	`21`	`The pooling method. Two modes are supported: 'avg' and 'max'.`	`The pooling method. Two modes are supported: 'avg' and 'max'.`
`16`	`22`	`Default is 'avg'. Default value is 'avg'.`	`Default is 'avg'. Default value is 'avg'.`
`17`	`23`	`* output_height:`	`* output_height:`
`18`	`24`	`default 1; Pooled output Y's height. Default value is 1.`	`default 1; Pooled output Y's height. Default value is 1.`
`19`	`25`	`* output_width:`	`* output_width:`
`20`	`26`	`default 1; Pooled output Y's width. Default value is 1.`	`default 1; Pooled output Y's width. Default value is 1.`
`21`	`27`	`* sampling_ratio:`	`* sampling_ratio:`
`22`	`28`	`Number of sampling points in the interpolation grid used to compute`	`Number of sampling points in the interpolation grid used to compute`
`23`	`29`	`the output value of each pooled output bin. If > 0, then exactly`	`the output value of each pooled output bin. If > 0, then exactly`
`24`	`30`	`sampling_ratio x sampling_ratio grid points are used. If == 0, then`	`sampling_ratio x sampling_ratio grid points are used. If == 0, then`
`25`	`31`	`an adaptive number of grid points are used (computed as`	`an adaptive number of grid points are used (computed as`
`26`	`32`	`ceil(roi_width / output_width), and likewise for height). Default is`	`ceil(roi_width / output_width), and likewise for height). Default is`
`27`	`33`	`0. Default value is 0.`	`0. Default value is 0.`
`28`	`34`	`* spatial_scale:`	`* spatial_scale:`
`29`	`35`	`Multiplicative spatial scale factor to translate ROI coordinates`	`Multiplicative spatial scale factor to translate ROI coordinates`
`30`	`36`	`from their input spatial scale to the scale used when pooling, i.e.,`	`from their input spatial scale to the scale used when pooling, i.e.,`
`31`	`37`	`spatial scale of the input feature map X relative to the input`	`spatial scale of the input feature map X relative to the input`
`32`	`38`	`image. E.g.; default is 1.0f. Default value is 1.0.`	`image. E.g.; default is 1.0f. Default value is 1.0.`
`33`	`39`
`34`	`40`	`Inputs`	`Inputs`
`35`	`41`
`36`	`42`	`* X (heterogeneous) - T1:`	`* X (heterogeneous) - T1:`
`37`	`43`	`Input data tensor from the previous operator; 4-D feature map of`	`Input data tensor from the previous operator; 4-D feature map of`
`38`	`44`	`shape (N, C, H, W), where N is the batch size, C is the number of`	`shape (N, C, H, W), where N is the batch size, C is the number of`
`39`	`45`	`channels, and H and W are the height and the width of the data.`	`channels, and H and W are the height and the width of the data.`
`40`	`46`	`* rois (heterogeneous) - T1:`	`* rois (heterogeneous) - T1:`
`41`	`47`	`RoIs (Regions of Interest) to pool over; rois is 2-D input of shape`	`RoIs (Regions of Interest) to pool over; rois is 2-D input of shape`
`42`	`48`	`(num_rois, 4) given as [[x1, y1, x2, y2], ...]. The RoIs'`	`(num_rois, 4) given as [[x1, y1, x2, y2], ...]. The RoIs'`
`43`	`49`	`coordinates are in the coordinate system of the input image. Each`	`coordinates are in the coordinate system of the input image. Each`
`44`	`50`	`coordinate set has a 1:1 correspondence with the 'batch_indices'`	`coordinate set has a 1:1 correspondence with the 'batch_indices'`
`45`	`51`	`input.`	`input.`
`46`	`52`	`* batch_indices (heterogeneous) - T2:`	`* batch_indices (heterogeneous) - T2:`
`47`	`53`	`1-D tensor of shape (num_rois,) with each element denoting the index`	`1-D tensor of shape (num_rois,) with each element denoting the index`
`48`	`54`	`of the corresponding image in the batch.`	`of the corresponding image in the batch.`
`49`	`55`
`50`	`56`	`Outputs`	`Outputs`
`51`	`57`
`52`	`58`	`* Y (heterogeneous) - T1:`	`* Y (heterogeneous) - T1:`
`53`	`59`	`RoI pooled output, 4-D tensor of shape (num_rois, C, output_height,`	`RoI pooled output, 4-D tensor of shape (num_rois, C, output_height,`
`54`	`60`	`output_width). The r-th batch element Y[r-1] is a pooled feature map`	`output_width). The r-th batch element Y[r-1] is a pooled feature map`
`55`	`61`	`corresponding to the r-th RoI X[r-1].`	`corresponding to the r-th RoI X[r-1].`
`56`	`62`
`57`	`63`	`Type Constraints`	`Type Constraints`
`58`	`64`
`59`	`65`	`* T1 in (`	`* T1 in (`
`60`	`66`	`tensor(double),`	`tensor(double),`
`61`	`67`	`tensor(float),`	`tensor(float),`
`62`	`68`	`tensor(float16)`	`tensor(float16)`
`63`	`69`	`):`	`):`
`64`	`70`	`Constrain types to float tensors.`	`Constrain types to float tensors.`
`65`	`71`	`* T2 in (`	`* T2 in (`
`66`	`72`	`tensor(int64)`	`tensor(int64)`
`67`	`73`	`):`	`):`
`68`	`74`	`Constrain types to int tensors.`	`Constrain types to int tensors.`

Links

Contents

Information

Previous topic

Next topic

RoiAlign#

RoiAlign - 16#

RoiAlign - 10#

RoiAlign - 16 #

RoiAlign - 10 #