Text-Visual Prompting for Efficient 2D Temporal Video Grounding
Yimeng Zhang, Xin Chen, Jinghan Jia, Sijia Liu, Ke Ding

TL;DR
This paper introduces a text-visual prompting framework for efficient 2D temporal video grounding, significantly reducing computational complexity while boosting performance on benchmark datasets.
Contribution
The authors propose a novel prompting approach that enables effective 2D TVG, replacing 3D CNNs, and introduce a new loss function for improved learning.
Findings
Achieves up to 30.77% performance improvement on ActivityNet Captions.
Provides 5x faster inference compared to 3D CNN-based methods.
Demonstrates effectiveness on Charades-STA and ActivityNet datasets.
Abstract
In this paper, we study the problem of temporal video grounding (TVG), which aims to predict the starting/ending time points of moments described by a text sentence within a long untrimmed video. Benefiting from fine-grained 3D visual features, the TVG techniques have achieved remarkable progress in recent years. However, the high complexity of 3D convolutional neural networks (CNNs) makes extracting dense 3D visual features time-consuming, which calls for intensive memory and computing resources. Towards efficient TVG, we propose a novel text-visual prompting (TVP) framework, which incorporates optimized perturbation patterns (that we call 'prompts') into both visual inputs and textual features of a TVG model. In sharp contrast to 3D CNNs, we show that TVP allows us to effectively co-train vision encoder and language encoder in a 2D TVG model and improves the performance of crossmodal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Video Analysis and Summarization
