Black Swan: Abductive and Defeasible Video Reasoning in Unpredictable   Events

Aditya Chinchure; Sahithya Ravi; Raymond Ng; Vered Shwartz; Boyang Li,; Leonid Sigal

arXiv:2412.05725·cs.CV·April 9, 2025

Black Swan: Abductive and Defeasible Video Reasoning in Unpredictable Events

Aditya Chinchure, Sahithya Ravi, Raymond Ng, Vered Shwartz, Boyang Li,, Leonid Sigal

PDF

Open Access 3 Datasets

TL;DR

This paper introduces BlackSwanSuite, a benchmark for testing vision-language models' ability to reason about unexpected, out-of-distribution events in videos, revealing significant gaps compared to human performance.

Contribution

The paper presents a novel benchmark suite for evaluating abductive and defeasible reasoning in VLMs on atypical video events, highlighting current model limitations.

Findings

01

Current VLMs lag behind humans by up to 32% on these tasks.

02

Significant performance gaps indicate limitations in reasoning about unexpected events.

03

The benchmark enables targeted evaluation of reasoning capabilities beyond pattern recognition.

Abstract

The commonsense reasoning capabilities of vision-language models (VLMs), especially in abductive reasoning and defeasible reasoning, remain poorly understood. Most benchmarks focus on typical visual scenarios, making it difficult to discern whether model performance stems from keen perception and reasoning skills, or reliance on pure statistical recall. We argue that by focusing on atypical events in videos, clearer insights can be gained on the core capabilities of VLMs. Explaining and understanding such out-of-distribution events requires models to extend beyond basic pattern recognition and regurgitation of their prior knowledge. To this end, we introduce BlackSwanSuite, a benchmark for evaluating VLMs' ability to reason about unexpected events through abductive and defeasible tasks. Our tasks artificially limit the amount of visual information provided to models while questioning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Games · Video Analysis and Summarization · Multimodal Machine Learning Applications

MethodsFocus