AGQA 2.0: An Updated Benchmark for Compositional Spatio-Temporal Reasoning
Madeleine Grunde-McLaughlin, Ranjay Krishna, Maneesh Agrawala

TL;DR
AGQA 2.0 is an improved benchmark for evaluating models' ability to perform compositional spatio-temporal reasoning in videos, featuring stricter answer balancing to reduce biases and provide more reliable assessments.
Contribution
The paper introduces AGQA 2.0, an enhanced version of the benchmark with improved balancing procedures to better evaluate visual reasoning models.
Findings
Models show improved performance on AGQA 2.0
Biases are further reduced in the new benchmark
AGQA 2.0 provides a more reliable evaluation of compositional reasoning
Abstract
Prior benchmarks have analyzed models' answers to questions about videos in order to measure visual compositional reasoning. Action Genome Question Answering (AGQA) is one such benchmark. AGQA provides a training/test split with balanced answer distributions to reduce the effect of linguistic biases. However, some biases remain in several AGQA categories. We introduce AGQA 2.0, a version of this benchmark with several improvements, most namely a stricter balancing procedure. We then report results on the updated benchmark for all experiments.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning
