Sherlock: Scalable Fact Learning in Images

Mohamed Elhoseiny; Scott Cohen; Walter Chang; Brian Price; Ahmed; Elgammal

arXiv:1511.04891·cs.CV·April 5, 2016

Sherlock: Scalable Fact Learning in Images

Mohamed Elhoseiny, Scott Cohen, Walter Chang, Brian Price, Ahmed, Elgammal

PDF

TL;DR

This paper introduces Sherlock, a scalable framework for understanding and modeling structured facts in images, enabling uniform recognition of objects, attributes, actions, and interactions simultaneously, with improved generalization and retrieval performance.

Contribution

Sherlock presents a unified approach to model diverse visual facts in images, introducing new models and datasets for structured fact learning and demonstrating their effectiveness.

Findings

01

Structured fact modeling improves visual understanding.

02

Proposed models outperform baselines in fact retrieval.

03

Large-scale dataset supports scalable fact learning.

Abstract

We study scalable and uniform understanding of facts in images. Existing visual recognition systems are typically modeled differently for each fact type such as objects, actions, and interactions. We propose a setting where all these facts can be modeled simultaneously with a capacity to understand unbounded number of facts in a structured way. The training data comes as structured facts in images, including (1) objects (e.g., $<$ boy $>$ ), (2) attributes (e.g., $<$ boy, tall $>$ ), (3) actions (e.g., $<$ boy, playing $>$ ), and (4) interactions (e.g., $<$ boy, riding, a horse $>$ ). Each fact has a semantic language view (e.g., $<$ boy, playing $>$ ) and a visual view (an image with this fact). We show that learning visual facts in a structured way enables not only a uniform but also generalizable visual understanding. We propose and investigate recent and strong approaches from the multiview…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.