RedNet: Residual Encoder-Decoder Network for indoor RGB-D Semantic Segmentation
Jindong Jiang, Lunan Zheng, Fei Luo, and Zhijun Zhang

TL;DR
RedNet is a novel residual encoder-decoder network that effectively combines RGB and depth data for indoor semantic segmentation, achieving state-of-the-art accuracy on benchmark datasets.
Contribution
The paper introduces RedNet, a residual encoder-decoder architecture with a fusion structure and pyramid supervision for improved indoor RGB-D semantic segmentation.
Findings
Achieves 47.8% mIoU on SUN RGB-D dataset.
Utilizes residual modules in encoder and decoder for better feature learning.
Employs pyramid supervision to enhance training efficiency.
Abstract
Indoor semantic segmentation has always been a difficult task in computer vision. In this paper, we propose an RGB-D residual encoder-decoder architecture, named RedNet, for indoor RGB-D semantic segmentation. In RedNet, the residual module is applied to both the encoder and decoder as the basic building block, and the skip-connection is used to bypass the spatial feature between the encoder and decoder. In order to incorporate the depth information of the scene, a fusion structure is constructed, which makes inference on RGB image and depth image separately, and fuses their features over several layers. In order to efficiently optimize the network's parameters, we propose a `pyramid supervision' training scheme, which applies supervised learning over different layers in the decoder, to cope with the problem of gradients vanishing. Experiment results show that the proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRemote Sensing and LiDAR Applications · Advanced Neural Network Applications · Video Surveillance and Tracking Methods
