SVFSearch: A Multimodal Knowledge-Intensive Benchmark for Short-Video Frame Search in the Gaming Vertical Domain

Lingtao Mao; Huangyu Dai; Xinyu Sun; Zihan Liang; Ben Chen; Chenyi Lei; Wenwu Ou

arXiv:2605.17946·cs.AI·May 21, 2026

SVFSearch: A Multimodal Knowledge-Intensive Benchmark for Short-Video Frame Search in the Gaming Vertical Domain

Lingtao Mao, Huangyu Dai, Xinyu Sun, Zihan Liang, Ben Chen, Chenyi Lei, Wenwu Ou

PDF

1 Datasets

TL;DR

This paper introduces SVFSearch, a comprehensive benchmark for short-video frame search in the Chinese gaming domain, evaluating multimodal models' retrieval and reasoning capabilities.

Contribution

It provides the first open, domain-specific benchmark with a standardized evaluation environment and analyzes various retrieval and reasoning paradigms.

Findings

01

The best open-source model achieves 66.4% accuracy.

02

Practical agentic search improves performance to 79.1%.

03

Oracle knowledge reaches 95.4%, highlighting existing gaps.

Abstract

Multimodal large language models are increasingly used as agent backbones that understand multimodal inputs, plan retrieval actions, invoke external tools, and reason over retrieved information. Yet existing benchmarks rarely evaluate this ability in short-video applications, where a paused frame is often visually ambiguous and answering requires vertical, long-tail, and fast-evolving domain knowledge. We introduce SVFSearch, the first open benchmark for short-video frame search in the Chinese gaming domain. SVFSearch contains 5,000 four-choice test examples and 4,198 auxiliary training examples, each centered on a paused game scene from a real short-video clip. To support fair and reproducible evaluation, SVFSearch provides a frozen offline retrieval environment with a game-domain text corpus, a topic-linked image gallery, and text, image, and multimodal retrieval interfaces, avoiding…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

svfsearch/SVFSearchData
dataset· 4.1k dl
4.1k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.