OmniVTG: A Large-Scale Dataset and Training Paradigm for Open-World Video Temporal Grounding

Minghang Zheng; Zihao Yin; Yi Yang; Yuxin Peng; Yang Liu

arXiv:2604.25276·cs.CV·April 29, 2026

OmniVTG: A Large-Scale Dataset and Training Paradigm for Open-World Video Temporal Grounding

Minghang Zheng, Zihao Yin, Yi Yang, Yuxin Peng, Yang Liu

PDF

1 Repo 1 Models 1 Datasets

TL;DR

OmniVTG introduces a large-scale dataset and a novel Self-Correction Chain-of-Thought training paradigm to improve open-world video temporal grounding, addressing semantic diversity and rare concept challenges.

Contribution

The paper presents OmniVTG dataset and a Self-Correction CoT training method, enhancing MLLMs' grounding ability in open-world video understanding tasks.

Findings

01

OmniVTG achieves state-of-the-art zero-shot performance on VTG benchmarks.

02

Self-Correction CoT training reduces performance gap between common and rare concepts.

03

The dataset covers a broader semantic space than previous datasets.

Abstract

Video Temporal Grounding (VTG), the task of localizing video segments from text queries, struggles in open-world settings due to limited dataset scale and semantic diversity, causing performance gaps between common and rare concepts. To overcome these limitations, we introduce OmniVTG, a new large-scale dataset for open-world VTG, coupled with a Self-Correction Chain-of-Thought (CoT) training paradigm designed to enhance the grounding capabilities of Multimodal Large Language Models (MLLMs). Our OmniVTG is constructed via a novel Semantic Coverage Iterative Expansion pipeline, which first identifies gaps in the vocabulary of existing datasets and collects videos that are highly likely to contain these target concepts. For high-quality annotation, we leverage the insight that modern MLLMs excel at dense captioning more than direct grounding and design a caption-centric data engine to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

oceanflowlab/OmniVTG
github

Models

🤗
zhengmh/OmniVTG-7B
model· 54 dl· ♡ 5
54 dl♡ 5

Datasets

zhengmh/OmniVTG-Dataset
dataset· 138 dl
138 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.