Video-Oasis: Rethinking Evaluation of Video Understanding

Geuntaek Lim; Minho Shim; Sungjune Park; Jaeyun Lee; Inwoong Lee; Taeoh Kim; Dongyoon Wee; Yukyung Choi

arXiv:2603.29616·cs.CV·April 1, 2026

Video-Oasis: Rethinking Evaluation of Video Understanding

Geuntaek Lim, Minho Shim, Sungjune Park, Jaeyun Lee, Inwoong Lee, Taeoh Kim, Dongyoon Wee, Yukyung Choi

PDF

1 Repo

TL;DR

Video-Oasis is a diagnostic suite that critically evaluates current video understanding benchmarks, revealing significant gaps and guiding future research with practical insights.

Contribution

It introduces a systematic evaluation framework for video understanding, highlighting limitations of existing benchmarks and providing guidelines for more robust future models.

Findings

01

54% of benchmark samples are solvable without visual or temporal input

02

State-of-the-art models barely outperform random guessing on remaining samples

03

Provides practical guidelines for designing more effective video understanding algorithms

Abstract

The inherent complexity of video understanding makes it difficult to attribute whether performance gains stem from visual perception, linguistic reasoning, or knowledge priors. While many benchmarks have emerged to assess high-level reasoning, the essential criteria that constitute video understanding remain largely overlooked. Instead of introducing yet another benchmark, we take a step back to re-examine the current landscape of video understanding. In this work, we provide Video-Oasis, a sustainable diagnostic suite designed to systematically evaluate existing evaluations and distill spatio-temporal challenges for video understanding. Our analysis reveals two critical findings: (1) 54% of existing benchmark samples are solvable without visual input or temporal context, and (2) on the remaining samples, state-of-the-art models exhibit performance barely exceeding random guessing. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sejong-rcv/Video-Oasis
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.