CRIPP-VQA: Counterfactual Reasoning about Implicit Physical Properties   via Video Question Answering

Maitreya Patel; Tejas Gokhale; Chitta Baral; Yezhou Yang

arXiv:2211.03779·cs.CV·November 8, 2022

CRIPP-VQA: Counterfactual Reasoning about Implicit Physical Properties via Video Question Answering

Maitreya Patel, Tejas Gokhale, Chitta Baral, Yezhou Yang

PDF

Open Access 1 Repo

TL;DR

This paper introduces CRIPP-VQA, a video question answering dataset designed to evaluate reasoning about implicit physical properties of objects, such as mass and friction, through counterfactual and planning questions in videos.

Contribution

The paper presents a new dataset, CRIPP-VQA, for reasoning about implicit physical properties in videos, and evaluates models on out-of-distribution scenarios involving unseen physical parameters.

Findings

01

Models perform significantly worse on questions about implicit properties compared to explicit properties.

02

The dataset enables evaluation of reasoning about unobserved physical attributes in dynamic scenes.

03

Out-of-distribution performance gaps highlight challenges in physical property reasoning.

Abstract

Videos often capture objects, their visible properties, their motion, and the interactions between different objects. Objects also have physical properties such as mass, which the imaging pipeline is unable to directly capture. However, these properties can be estimated by utilizing cues from relative object motion and the dynamics introduced by collisions. In this paper, we introduce CRIPP-VQA, a new video question answering dataset for reasoning about the implicit physical properties of objects in a scene. CRIPP-VQA contains videos of objects in motion, annotated with questions that involve counterfactual reasoning about the effect of actions, questions about planning in order to reach a goal, and descriptive questions about visible properties of objects. The CRIPP-VQA test set enables evaluation under several out-of-distribution settings -- videos with objects with masses,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

maitreyapatel/cripp-vqa
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning

MethodsTest