Multimodal Recurrent Neural Networks with Information Transfer Layers for Indoor Scene Labeling
Abrar H. Abdulnabi, Bing Shuai, Zhen Zuo, Lap-Pui Chau, Gang Wang

TL;DR
This paper introduces Multimodal RNNs with information transfer layers for improved indoor scene labeling using RGB-D data, effectively capturing cross-modality features and contextual information to outperform existing methods.
Contribution
The paper presents a novel Multimodal RNN architecture with learnable transfer layers for RGB-D scene segmentation, enhancing cross-modality feature extraction.
Findings
Outperforms previous methods on RGB-D benchmarks
Achieves competitive results with state-of-the-art approaches
Effectively models contextual information in 2D images
Abstract
This paper proposes a new method called Multimodal RNNs for RGB-D scene semantic segmentation. It is optimized to classify image pixels given two input sources: RGB color channels and Depth maps. It simultaneously performs training of two recurrent neural networks (RNNs) that are crossly connected through information transfer layers, which are learnt to adaptively extract relevant cross-modality features. Each RNN model learns its representations from its own previous hidden states and transferred patterns from the other RNNs previous hidden states; thus, both model-specific and crossmodality features are retained. We exploit the structure of quad-directional 2D-RNNs to model the short and long range contextual information in the 2D input image. We carefully designed various baselines to efficiently examine our proposed model structure. We test our Multimodal RNNs method on popular…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
