EgoEsportsQA: An Egocentric Video Benchmark for Perception and Reasoning in Esports

Jianzhe Ma; Zhonghao Cao; Shangkui Chen; Yichen Xu; Wenxuan Wang; Qin Jin

arXiv:2604.12320·cs.CV·April 21, 2026

EgoEsportsQA: An Egocentric Video Benchmark for Perception and Reasoning in Esports

Jianzhe Ma, Zhonghao Cao, Shangkui Chen, Yichen Xu, Wenxuan Wang, Qin Jin

PDF

TL;DR

EgoEsportsQA introduces a new benchmark dataset for evaluating perception and reasoning in egocentric esports videos, highlighting current model limitations in high-velocity virtual environments.

Contribution

The paper presents a novel QA benchmark with 1,745 questions from professional esports matches, structured into a taxonomy to evaluate perception and reasoning capabilities of Video-LLMs.

Findings

01

Current Video-LLMs achieve only 71.58% accuracy on the benchmark.

02

Models perform better in perception than in tactical reasoning.

03

Deep micro-operations remain challenging for existing models.

Abstract

While video large language models (Video-LLMs) excel in understanding slow-paced, real-world egocentric videos, their capabilities in high-velocity, information-dense virtual environments remain under-explored. Existing benchmarks focus on daily activities, yet lack a rigorous testbed for evaluating fast, rule-bound reasoning in virtual scenarios. To fill this gap, we introduce EgoEsportsQA, a pioneering video question-answering (QA) benchmark for grounding perception and reasoning in expert esports knowledge. We curate 1,745 high-quality QA pairs from professional matches across 3 first-person shooter games via a scalable six-stage pipeline. These questions are structured into a two-dimensional decoupled taxonomy: 11 sub-tasks in the cognitive capability dimension (covering perception and reasoning levels) and 6 sub-tasks in the esports knowledge dimension. Comprehensive evaluations of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.