A Holistically-Guided Decoder for Deep Representation Learning with Applications to Semantic Segmentation and Object Detection
Jianbo Liu, Sijie Ren, Yuanjie Zheng, Xiaogang Wang, Hongsheng Li

TL;DR
This paper introduces a holistically-guided decoder that efficiently generates high-resolution, semantic-rich features for visual tasks, outperforming existing methods in segmentation and detection with lower computational costs.
Contribution
The paper proposes a novel holistically-guided decoder that leverages multi-scale encoder features to produce high-resolution semantic features efficiently, improving performance and reducing computational costs.
Findings
EfficientFCN achieves comparable or better segmentation performance with 1/3 of the computational cost.
HGD-FPN improves object detection mAP by over 2% with ResNet-50 backbones.
The method effectively combines high-level and low-level features for enhanced visual understanding.
Abstract
Both high-level and high-resolution feature representations are of great importance in various visual understanding tasks. To acquire high-resolution feature maps with high-level semantic information, one common strategy is to adopt dilated convolutions in the backbone networks to extract high-resolution feature maps, such as the dilatedFCN-based methods for semantic segmentation. However, due to many convolution operations are conducted on the high-resolution feature maps, such methods have large computational complexity and memory consumption. In this paper, we propose one novel holistically-guided decoder which is introduced to obtain the high-resolution semantic-rich feature maps via the multi-scale features from the encoder. The decoding is achieved via novel holistic codeword generation and codeword assembly operations, which take advantages of both the high-level and low-level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
MethodsConvolution
