LSTM-CF: Unifying Context Modeling and Fusion with LSTMs for RGB-D Scene Labeling
Zhen Li, Yukang Gan, Xiaodan Liang, Yizhou Yu, Hui Cheng, Liang Lin

TL;DR
This paper introduces LSTM-CF, a novel deep learning model that effectively captures and fuses contextual information from RGB and depth data for improved pixelwise semantic labeling of scenes.
Contribution
The paper proposes a new LSTM-based context fusion model integrated into CNNs, enhancing scene labeling accuracy by capturing long-range dependencies across channels and spatial directions.
Findings
Achieved state-of-the-art accuracy on SUNRGBD and NYUDv2 datasets.
Improved average class accuracy by over 2% and 5% respectively.
Demonstrated effective vertical and horizontal context fusion for scene understanding.
Abstract
Semantic labeling of RGB-D scenes is crucial to many intelligent applications including perceptual robotics. It generates pixelwise and fine-grained label maps from simultaneously sensed photometric (RGB) and depth channels. This paper addresses this problem by i) developing a novel Long Short-Term Memorized Context Fusion (LSTM-CF) Model that captures and fuses contextual information from multiple channels of photometric and depth data, and ii) incorporating this model into deep convolutional neural networks (CNNs) for end-to-end training. Specifically, contexts in photometric and depth channels are, respectively, captured by stacking several convolutional layers and a long short-term memory layer; the memory layer encodes both short-range and long-range spatial dependencies in an image along the vertical direction. Another long short-term memorized fusion layer is set up to integrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Neural Network Applications · Video Surveillance and Tracking Methods
