Visual-Semantic Scene Understanding by Sharing Labels in a Context Network
Ishani Chakraborty, Ahmed Elgammal

TL;DR
This paper introduces VSIM, a model that integrates visual and semantic contexts for improved object naming in complex scenes, using shared labels and iterative inference to enhance scene understanding.
Contribution
The paper proposes a novel Visual Semantic Integration Model that combines semantic and visual contexts via shared labels and an iterative inference algorithm, outperforming existing methods.
Findings
VSIM surpasses state-of-the-art performance on SUN09 dataset.
Shared label approach improves object naming accuracy.
Iterative Data Augmentation effectively combines visual and semantic cues.
Abstract
We consider the problem of naming objects in complex, natural scenes containing widely varying object appearance and subtly different names. Informed by cognitive research, we propose an approach based on sharing context based object hypotheses between visual and lexical spaces. To this end, we present the Visual Semantic Integration Model (VSIM) that represents object labels as entities shared between semantic and visual contexts and infers a new image by updating labels through context switching. At the core of VSIM is a semantic Pachinko Allocation Model and a visual nearest neighbor Latent Dirichlet Allocation Model. For inference, we derive an iterative Data Augmentation algorithm that pools the label probabilities and maximizes the joint label posterior of an image. Our model surpasses the performance of state-of-art methods in several visual tasks on the challenging SUN09 dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Image Retrieval and Classification Techniques
