GSTO: Gated Scale-Transfer Operation for Multi-Scale Feature Learning in Pixel Labeling
Zhuoying Wang, Yongtao Wang, Zhi Tang, Yangyan Li, Ying, Chen, Haibin Ling, Weisi Lin

TL;DR
This paper introduces GSTO, a novel gated scale-transfer operation that enhances multi-scale feature learning in CNNs for pixel labeling, leading to state-of-the-art results with minimal extra computational cost.
Contribution
GSTO provides a new spatially selective scale-transfer operation that improves multi-scale feature learning in CNNs, surpassing traditional methods like up-sampling and down-sampling.
Findings
GSTO achieves state-of-the-art results on COCO, Cityscapes, LIP, and Pascal Context benchmarks.
GSTO can be integrated into existing networks and modules, boosting their performance.
GSTO is lightweight, flexible, and can operate with or without supervision.
Abstract
Existing CNN-based methods for pixel labeling heavily depend on multi-scale features to meet the requirements of both semantic comprehension and detail preservation. State-of-the-art pixel labeling neural networks widely exploit conventional scale-transfer operations, i.e., up-sampling and down-sampling to learn multi-scale features. In this work, we find that these operations lead to scale-confused features and suboptimal performance because they are spatial-invariant and directly transit all feature information cross scales without spatial selection. To address this issue, we propose the Gated Scale-Transfer Operation (GSTO) to properly transit spatial-filtered features to another scale. Specifically, GSTO can work either with or without extra supervision. Unsupervised GSTO is learned from the feature itself while the supervised one is guided by the supervised probability matrix. Both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Human Pose and Action Recognition · Video Surveillance and Tracking Methods
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Dilated Convolution · Spatial Pyramid Pooling · Residual Connection · Convolution · Atrous Spatial Pyramid Pooling · HRNet
