Keeping the Evidence Chain: Semantic Evidence Allocation for Training-Free Token Pruning in Video Temporal Grounding

Jiaqi Li; Shuntian Zheng; Yixian Shen; Jia-Hong Huang; Xiaoman Lu; Minzhe Ni; Yu Guan

arXiv:2603.05663·cs.CV·March 9, 2026

Keeping the Evidence Chain: Semantic Evidence Allocation for Training-Free Token Pruning in Video Temporal Grounding

Jiaqi Li, Shuntian Zheng, Yixian Shen, Jia-Hong Huang, Xiaoman Lu, Minzhe Ni, Yu Guan

PDF

Open Access

TL;DR

This paper introduces SemVID, a training-free token pruning method for Video Temporal Grounding that maintains critical evidence and connectivity, significantly improving efficiency while preserving accuracy.

Contribution

SemVID is the first training-free framework that strategically allocates and selects tokens based on evidence retention and connectivity for VTG.

Findings

01

Retains up to 95.4% mIoU with only 12.5% tokens

02

Achieves up to 5.8x speedup in prefill time

03

Outperforms prior methods under similar token budgets

Abstract

Video Temporal Grounding (VTG) localizes the temporal boundaries of a query-relevant moment in long, untrimmed videos, making video-language-model (VLM) pipelines prohibitively expensive. While recent training-free visual token pruning has shown success in video question answering, naively applying these objectives to VTG often causes drastic degradation, as VTG crucially depends on boundary-sensitive evidence and cross-frame reasoning chains. We therefore identify two VTG-specific pruning principles: Evidence Retention (ER), which keeps query-critical patches especially around event boundaries, and Connectivity Strength (CS), which preserves token-level cross-frame connectivity for long-range evidence aggregation. Building on these insights, we propose SemVID, a training-free pruning framework that constructs a compact yet coherent token subset with complementary semantic roles. SemVID…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis