Measuring Compositional Consistency for Video Question Answering
Mona Gandhi, Mustafa Omer Gul, Eva Prakash, Madeleine, Grunde-McLaughlin, Ranjay Krishna, Maneesh Agrawala

TL;DR
This paper introduces a new benchmark and evaluation metrics for assessing the compositional reasoning abilities of video question answering models, revealing their limitations in reasoning correctly and consistently.
Contribution
It develops a question decomposition engine and creates AGQA-Decomp, a large benchmark with question graphs to evaluate compositional reasoning in models.
Findings
Models often fail to reason correctly through compositions.
Models rely on incorrect reasoning or data biases.
High accuracy achieved despite reasoning failures.
Abstract
Recent video question answering benchmarks indicate that state-of-the-art models struggle to answer compositional questions. However, it remains unclear which types of compositional reasoning cause models to mispredict. Furthermore, it is difficult to discern whether models arrive at answers using compositional reasoning or by leveraging data biases. In this paper, we develop a question decomposition engine that programmatically deconstructs a compositional question into a directed acyclic graph of sub-questions. The graph is designed such that each parent question is a composition of its children. We present AGQA-Decomp, a benchmark containing question graphs, with an average of sub-questions per graph, and total new sub-questions. Using question graphs, we evaluate three state-of-the-art models with a suite of novel compositional consistency metrics. We find…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition
