SYNCR: A Cross-Video Reasoning Benchmark with Synthetic Grounding

Sara Ghazanfari; Siddharth Garg; Prashanth Krishnamurthy; Farshad Khorrami

arXiv:2605.08412·cs.CV·May 12, 2026

SYNCR: A Cross-Video Reasoning Benchmark with Synthetic Grounding

Sara Ghazanfari, Siddharth Garg, Prashanth Krishnamurthy, Farshad Khorrami

PDF

1 Repo

TL;DR

SYNCR is a synthetic, multi-video reasoning benchmark designed to evaluate and diagnose the reasoning abilities of multimodal large language models across various tasks, revealing significant gaps compared to human performance.

Contribution

The paper introduces SYNCR, a novel synthetic multi-video reasoning benchmark with verified grounding, enabling precise evaluation of models' reasoning capabilities across multiple diagnostic tasks.

Findings

01

Current models achieve only 52.5% accuracy, far below human baseline of 89.5%.

02

Models excel at temporal ordering but struggle with physical and spatial reasoning.

03

Parameter scaling and specialized training improve temporal alignment but not fine-grained physical tracking.

Abstract

Multimodal Large Language Models (MLLMs) have made rapid progress in single-video understanding, yet their ability to reason across multiple independent video streams remains poorly understood. Existing multi-video benchmarks rely largely on human-annotated real-world footage, limiting the precision of spatial, temporal, and physical ground truth and making it difficult to diagnose model failures. We introduce SYNCR, a controlled synthetic benchmark for cross-video reasoning with programmatically verified grounding. Built using Habitat, Kubric, and CLEVRER simulator engines, SYNCR contains 8,163 multi-video question-answer pairs grounded in 9,650 unique videos. It evaluates MLLMs across eight tasks spanning four diagnostic pillars: Temporal Alignment, Spatial Tracking, Comparative Reasoning, and Holistic Synthesis. Our zero-shot evaluation of leading open- and closed-weight MLLMs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

SaraGhazanfari/SYNCR
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.