Multimodal Recurrent Neural Networks with Information Transfer Layers   for Indoor Scene Labeling

Abrar H. Abdulnabi; Bing Shuai; Zhen Zuo; Lap-Pui Chau; Gang Wang

arXiv:1803.04687·cs.CV·March 14, 2018

Multimodal Recurrent Neural Networks with Information Transfer Layers for Indoor Scene Labeling

Abrar H. Abdulnabi, Bing Shuai, Zhen Zuo, Lap-Pui Chau, Gang Wang

PDF

TL;DR

This paper introduces Multimodal RNNs with information transfer layers for improved indoor scene labeling using RGB-D data, effectively capturing cross-modality features and contextual information to outperform existing methods.

Contribution

The paper presents a novel Multimodal RNN architecture with learnable transfer layers for RGB-D scene segmentation, enhancing cross-modality feature extraction.

Findings

01

Outperforms previous methods on RGB-D benchmarks

02

Achieves competitive results with state-of-the-art approaches

03

Effectively models contextual information in 2D images

Abstract

This paper proposes a new method called Multimodal RNNs for RGB-D scene semantic segmentation. It is optimized to classify image pixels given two input sources: RGB color channels and Depth maps. It simultaneously performs training of two recurrent neural networks (RNNs) that are crossly connected through information transfer layers, which are learnt to adaptively extract relevant cross-modality features. Each RNN model learns its representations from its own previous hidden states and transferred patterns from the other RNNs previous hidden states; thus, both model-specific and crossmodality features are retained. We exploit the structure of quad-directional 2D-RNNs to model the short and long range contextual information in the 2D input image. We carefully designed various baselines to efficiently examine our proposed model structure. We test our Multimodal RNNs method on popular…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.