Understanding Complexity in VideoQA via Visual Program Generation

Cristobal Eyzaguirre; Igor Vasiljevic; Achal Dave; Jiajun Wu; Rares Andrei Ambrus; Thomas Kollar; Juan Carlos Niebles; Pavel Tokmakov

arXiv:2505.13429·cs.CV·May 20, 2025

Understanding Complexity in VideoQA via Visual Program Generation

Cristobal Eyzaguirre, Igor Vasiljevic, Achal Dave, Jiajun Wu, Rares Andrei Ambrus, Thomas Kollar, Juan Carlos Niebles, Pavel Tokmakov

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel, data-driven method for assessing and generating complex questions in VideoQA by analyzing the complexity of automatically generated code, which correlates better with model difficulty than human estimates.

Contribution

It presents an automatic approach using code complexity as a proxy for question difficulty, enabling scalable analysis and generation of challenging VideoQA questions.

Findings

01

Code complexity correlates better with model performance than human estimates.

02

The proposed method can generate questions 1.9 times more difficult than existing benchmarks.

03

The approach identifies primitives that predict question difficulty across models.

Abstract

We propose a data-driven approach to analyzing query complexity in Video Question Answering (VideoQA). Previous efforts in benchmark design have relied on human expertise to design challenging questions, yet we experimentally show that humans struggle to predict which questions are difficult for machine learning models. Our automatic approach leverages recent advances in code generation for visual question answering, using the complexity of generated code as a proxy for question difficulty. We demonstrate that this measure correlates significantly better with model performance than human estimates. To operationalize this insight, we propose an algorithm for estimating question complexity from code. It identifies fine-grained primitives that correlate with the hardest questions for any given set of models, making it easy to scale to new approaches in the future. Finally, to further…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Understanding Complexity in VideoQA via Visual Program Generation· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques

MethodsSparse Evolutionary Training