VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos
Hanoona Rasheed, Abdelrahman Shaker, Anqi Tang, Muhammad Maaz, Ming-Hsuan Yang, Salman Khan, Fahad Shahbaz Khan

TL;DR
VideoMathQA introduces a comprehensive benchmark for evaluating multimodal mathematical reasoning in videos, emphasizing the integration of visual, audio, and textual information over extended timeframes to assess model reasoning capabilities.
Contribution
This work presents VideoMathQA, a novel benchmark with diverse, multi-modal, and temporally extended mathematical questions, along with high-quality annotations to evaluate reasoning beyond perception.
Findings
Existing models struggle with multimodal, long-duration reasoning tasks.
The benchmark reveals significant gaps in current model capabilities.
Multi-step reasoning annotations enable detailed diagnosis of model performance.
Abstract
Mathematical reasoning in real-world video settings presents a fundamentally different challenge than in static images or text. It requires interpreting fine-grained visual information, accurately reading handwritten or digital text, and integrating spoken cues, often dispersed non-linearly over time. In such multimodal contexts, success hinges not just on perception, but on selectively identifying and integrating the right contextual details from a rich and noisy stream of content. To this end, we introduce VideoMathQA, a benchmark designed to evaluate whether models can perform such temporally extended cross-modal reasoning on videos. The benchmark spans 10 diverse mathematical domains, covering videos ranging from 10 seconds to over 1 hour. It requires models to interpret structured visual content, understand instructional narratives, and jointly ground concepts across visual, audio,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Mathematics Education and Teaching Techniques · Intelligent Tutoring Systems and Adaptive Learning
