A Holistically-Guided Decoder for Deep Representation Learning with   Applications to Semantic Segmentation and Object Detection

Jianbo Liu; Sijie Ren; Yuanjie Zheng; Xiaogang Wang; Hongsheng Li

arXiv:2012.10162·cs.CV·December 21, 2020·1 cites

A Holistically-Guided Decoder for Deep Representation Learning with Applications to Semantic Segmentation and Object Detection

Jianbo Liu, Sijie Ren, Yuanjie Zheng, Xiaogang Wang, Hongsheng Li

PDF

Open Access

TL;DR

This paper introduces a holistically-guided decoder that efficiently generates high-resolution, semantic-rich features for visual tasks, outperforming existing methods in segmentation and detection with lower computational costs.

Contribution

The paper proposes a novel holistically-guided decoder that leverages multi-scale encoder features to produce high-resolution semantic features efficiently, improving performance and reducing computational costs.

Findings

01

EfficientFCN achieves comparable or better segmentation performance with 1/3 of the computational cost.

02

HGD-FPN improves object detection mAP by over 2% with ResNet-50 backbones.

03

The method effectively combines high-level and low-level features for enhanced visual understanding.

Abstract

Both high-level and high-resolution feature representations are of great importance in various visual understanding tasks. To acquire high-resolution feature maps with high-level semantic information, one common strategy is to adopt dilated convolutions in the backbone networks to extract high-resolution feature maps, such as the dilatedFCN-based methods for semantic segmentation. However, due to many convolution operations are conducted on the high-resolution feature maps, such methods have large computational complexity and memory consumption. In this paper, we propose one novel holistically-guided decoder which is introduced to obtain the high-resolution semantic-rich feature maps via the multi-scale features from the encoder. The decoding is achieved via novel holistic codeword generation and codeword assembly operations, which take advantages of both the high-level and low-level…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications

MethodsConvolution