The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning
Jack Hessel, Jena D. Hwang, Jae Sung Park, Rowan Zellers and, Chandra Bhagavatula, Anna Rohrbach, Kate Saenko, Yejin Choi

TL;DR
This paper introduces Sherlock, a large dataset for testing machine abductive reasoning in images, and evaluates models' ability to infer beyond literal content, highlighting the gap between AI and human reasoning.
Contribution
The paper presents a novel, large-scale abductive reasoning dataset for images and benchmarks models' capabilities to perform inference, localization, and human-like judgment.
Findings
Fine-tuned CLIP-RN50x64 outperforms baselines
Models show significant gap compared to human performance
Dataset enables comprehensive evaluation of visual abductive reasoning
Abstract
Humans have remarkable capacity to reason abductively and hypothesize about what lies beyond the literal content of an image. By identifying concrete visual clues scattered throughout a scene, we almost can't help but draw probable inferences beyond the literal scene based on our everyday experience and knowledge about the world. For example, if we see a "20 mph" sign alongside a road, we might assume the street sits in a residential area (rather than on a highway), even if no houses are pictured. Can machines perform similar visual reasoning? We present Sherlock, an annotated corpus of 103K images for testing machine capacity for abductive reasoning beyond literal image contents. We adopt a free-viewing paradigm: participants first observe and identify salient clues within images (e.g., objects, actions) and then provide a plausible inference about the scene, given the clue. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Topic Modeling
