The Abduction of Sherlock Holmes: A Dataset for Visual Abductive   Reasoning

Jack Hessel; Jena D. Hwang; Jae Sung Park; Rowan Zellers and; Chandra Bhagavatula; Anna Rohrbach; Kate Saenko; Yejin Choi

arXiv:2202.04800·cs.CV·July 26, 2022

The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning

Jack Hessel, Jena D. Hwang, Jae Sung Park, Rowan Zellers and, Chandra Bhagavatula, Anna Rohrbach, Kate Saenko, Yejin Choi

PDF

Open Access 1 Repo

TL;DR

This paper introduces Sherlock, a large dataset for testing machine abductive reasoning in images, and evaluates models' ability to infer beyond literal content, highlighting the gap between AI and human reasoning.

Contribution

The paper presents a novel, large-scale abductive reasoning dataset for images and benchmarks models' capabilities to perform inference, localization, and human-like judgment.

Findings

01

Fine-tuned CLIP-RN50x64 outperforms baselines

02

Models show significant gap compared to human performance

03

Dataset enables comprehensive evaluation of visual abductive reasoning

Abstract

Humans have remarkable capacity to reason abductively and hypothesize about what lies beyond the literal content of an image. By identifying concrete visual clues scattered throughout a scene, we almost can't help but draw probable inferences beyond the literal scene based on our everyday experience and knowledge about the world. For example, if we see a "20 mph" sign alongside a road, we might assume the street sits in a residential area (rather than on a highway), even if no houses are pictured. Can machines perform similar visual reasoning? We present Sherlock, an annotated corpus of 103K images for testing machine capacity for abductive reasoning beyond literal image contents. We adopt a free-viewing paradigm: participants first observe and identify salient clues within images (e.g., objects, actions) and then provide a plausible inference about the scene, given the clue. In…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lunaproject22/rpa
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Topic Modeling