Finding Optimal Video Moment without Training: Gaussian Boundary Optimization for Weakly Supervised Video Grounding

Sunoh Kim; Kimin Yun; Daeho Um

arXiv:2602.03071·cs.CV·February 5, 2026

Finding Optimal Video Moment without Training: Gaussian Boundary Optimization for Weakly Supervised Video Grounding

Sunoh Kim, Kimin Yun, Daeho Um

PDF

Open Access

TL;DR

This paper introduces Gaussian Boundary Optimization (GBO), a training-free inference method for weakly supervised video grounding that improves localization accuracy by solving a principled optimization problem, achieving state-of-the-art results.

Contribution

GBO provides a novel, training-free inference framework with a closed-form solution for better video segment localization in weakly supervised settings.

Findings

01

GBO significantly improves localization accuracy.

02

GBO achieves state-of-the-art results on benchmarks.

03

GBO is compatible with various proposal architectures.

Abstract

Weakly supervised temporal video grounding aims to localize query-relevant segments in untrimmed videos using only video-sentence pairs, without requiring ground-truth segment annotations that specify exact temporal boundaries. Recent approaches tackle this task by utilizing Gaussian-based temporal proposals to represent query-relevant segments. However, their inference strategies rely on heuristic mappings from Gaussian parameters to segment boundaries, resulting in suboptimal localization performance. To address this issue, we propose Gaussian Boundary Optimization (GBO), a novel inference framework that predicts segment boundaries by solving a principled optimization problem that balances proposal coverage and segment compactness. We derive a closed-form solution for this problem and rigorously analyze the optimality conditions under varying penalty regimes. Beyond its theoretical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning