Foresee-to-Ground: From Predictive Temporal Perception to Evidence-Driven Reasoning for Video Temporal Grounding

Zelin Zheng; Xinyan Liu; Ruixin Li; Antoni B. Chan; Guorong Li; Qingming Huang; Laiyun Qing

arXiv:2605.21973·cs.CV·May 22, 2026

Foresee-to-Ground: From Predictive Temporal Perception to Evidence-Driven Reasoning for Video Temporal Grounding

Zelin Zheng, Xinyan Liu, Ruixin Li, Antoni B. Chan, Guorong Li, Qingming Huang, Laiyun Qing

PDF

TL;DR

Foresee-to-Ground (F2G) enhances video temporal grounding by combining predictive perception with evidence-based reasoning, improving accuracy and robustness across benchmarks and models.

Contribution

The paper introduces F2G, a novel framework that reformulates VTG as an identify-then-measure problem, decoupling event detection from boundary measurement.

Findings

01

F2G improves grounding accuracy across multiple benchmarks.

02

F2G transfers robustly across different Video-LLM backbones.

03

F2G maintains general video understanding capabilities.

Abstract

Current Video-LLM approaches for Video Temporal Grounding (VTG) typically rely on direct timestamp generation from an unstructured visual-token stream, often leading to brittle numerics and inconsistent boundaries. To address this, we propose Foresee-to-Ground (F2G), a framework that reformulates VTG as a verifiable Identify-then-Measure problem. F2G integrates Predictive Temporal Perception with Evidence-Driven Reasoning: it learns boundary-sensitive temporal representations to build a video-wide evidence pool of candidate event segments, and exposes these segments to the LLM as citable evidence units that bind boundary prediction to explicit event hypotheses. By decoupling event identification from precise boundary measurement, F2G stabilizes grounding and makes predictions verifiable. Extensive experiments demonstrate that F2G consistently improves grounding accuracy across diverse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.