Visual-Semantic Scene Understanding by Sharing Labels in a Context   Network

Ishani Chakraborty; Ahmed Elgammal

arXiv:1309.3809·cs.CV·September 17, 2013·1 cites

Visual-Semantic Scene Understanding by Sharing Labels in a Context Network

Ishani Chakraborty, Ahmed Elgammal

PDF

Open Access

TL;DR

This paper introduces VSIM, a model that integrates visual and semantic contexts for improved object naming in complex scenes, using shared labels and iterative inference to enhance scene understanding.

Contribution

The paper proposes a novel Visual Semantic Integration Model that combines semantic and visual contexts via shared labels and an iterative inference algorithm, outperforming existing methods.

Findings

01

VSIM surpasses state-of-the-art performance on SUN09 dataset.

02

Shared label approach improves object naming accuracy.

03

Iterative Data Augmentation effectively combines visual and semantic cues.

Abstract

We consider the problem of naming objects in complex, natural scenes containing widely varying object appearance and subtly different names. Informed by cognitive research, we propose an approach based on sharing context based object hypotheses between visual and lexical spaces. To this end, we present the Visual Semantic Integration Model (VSIM) that represents object labels as entities shared between semantic and visual contexts and infers a new image by updating labels through context switching. At the core of VSIM is a semantic Pachinko Allocation Model and a visual nearest neighbor Latent Dirichlet Allocation Model. For inference, we derive an iterative Data Augmentation algorithm that pools the label probabilities and maximizes the joint label posterior of an image. Our model surpasses the performance of state-of-art methods in several visual tasks on the challenging SUN09 dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Image Retrieval and Classification Techniques