Understanding Complexity in VideoQA via Visual Program Generation
Cristobal Eyzaguirre, Igor Vasiljevic, Achal Dave, Jiajun Wu, Rares Andrei Ambrus, Thomas Kollar, Juan Carlos Niebles, Pavel Tokmakov

TL;DR
This paper introduces a novel, data-driven method for assessing and generating complex questions in VideoQA by analyzing the complexity of automatically generated code, which correlates better with model difficulty than human estimates.
Contribution
It presents an automatic approach using code complexity as a proxy for question difficulty, enabling scalable analysis and generation of challenging VideoQA questions.
Findings
Code complexity correlates better with model performance than human estimates.
The proposed method can generate questions 1.9 times more difficult than existing benchmarks.
The approach identifies primitives that predict question difficulty across models.
Abstract
We propose a data-driven approach to analyzing query complexity in Video Question Answering (VideoQA). Previous efforts in benchmark design have relied on human expertise to design challenging questions, yet we experimentally show that humans struggle to predict which questions are difficult for machine learning models. Our automatic approach leverages recent advances in code generation for visual question answering, using the complexity of generated code as a proxy for question difficulty. We demonstrate that this measure correlates significantly better with model performance than human estimates. To operationalize this insight, we propose an algorithm for estimating question complexity from code. It identifies fine-grained primitives that correlate with the hardest questions for any given set of models, making it easy to scale to new approaches in the future. Finally, to further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
MethodsSparse Evolutionary Training
