TimeRefine: Temporal Grounding with Time Refining Video LLM
Xizi Wang, Feng Cheng, Ziyang Wang, Huiyu Wang, Md Mohaiminul Islam,, Lorenzo Torresani, Mohit Bansal, Gedas Bertasius, David Crandall

TL;DR
TimeRefine introduces a progressive temporal refining approach for video grounding with LLMs, significantly improving localization accuracy by iterative offset prediction and auxiliary supervision.
Contribution
It reformulates temporal grounding as a multi-step refinement process and adds an auxiliary loss to enhance temporal perception in Video LLMs.
Findings
Achieves 3.6% mIoU improvement on ActivityNet
Achieves 5.0% mIoU improvement on Charades-STA
Demonstrates effective integration into existing LLM-based methods
Abstract
Video temporal grounding aims to localize relevant temporal boundaries in a video given a textual prompt. Recent work has focused on enabling Video LLMs to perform video temporal grounding via next-token prediction of temporal timestamps. However, accurately localizing timestamps in videos remains challenging for Video LLMs when relying solely on temporal token prediction. Our proposed TimeRefine addresses this challenge in two ways. First, instead of directly predicting the start and end timestamps, we reformulate the temporal grounding task as a temporal refining task: the model first makes rough predictions and then refines them by predicting offsets to the target segment. This refining process is repeated multiple times, through which the model progressively self-improves its temporal localization accuracy. Second, to enhance the model's temporal perception capabilities, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Time Series Analysis and Forecasting · Natural Language Processing Techniques
