Soft Correspondences in Multimodal Scene Parsing

Sarah Taghavi Namin; Mohammad Najafi; Mathieu Salzmann; and Lars; Petersson

arXiv:1709.09843·cs.CV·September 29, 2017

Soft Correspondences in Multimodal Scene Parsing

Sarah Taghavi Namin, Mohammad Najafi, Mathieu Salzmann, and Lars, Petersson

PDF

Open Access

TL;DR

This paper introduces a novel CRF-based approach with latent nodes to handle inconsistencies in multimodal scene parsing, improving accuracy in 2D and 3D semantic labeling tasks.

Contribution

It proposes a new method that explicitly models modality inconsistencies with latent nodes and learns potential functions, outperforming state-of-the-art methods.

Findings

01

Outperforms state-of-the-art on two datasets

02

Effectively models modality inconsistencies

03

Improves semantic and geometric inference in 2D and 3D

Abstract

Exploiting multiple modalities for semantic scene parsing has been shown to improve accuracy over the singlemodality scenario. However multimodal datasets often suffer from problems such as data misalignment and label inconsistencies, where the existing methods assume that corresponding regions in two modalities must have identical labels. We propose to address this issue, by formulating multimodal semantic labeling as inference in a CRF and introducing latent nodes to explicitly model inconsistencies between two modalities. These latent nodes allow us not only to leverage information from both domains to improve their labeling, but also to cut the edges between inconsistent regions. We propose to learn intradomain and inter-domain potential functions from training data to avoid hand-tuning of the model parameters. We evaluate our approach on two publicly available datasets containing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques

MethodsConditional Random Field