Grounding-Prompter: Prompting LLM with Multimodal Information for Temporal Sentence Grounding in Long Videos
Houlun Chen, Xin Wang, Hong Chen, Zihan Song, Jia Jia, Wenwu Zhu

TL;DR
This paper introduces Grounding-Prompter, a novel method that prompts large language models with multimodal information to improve temporal sentence grounding in long videos, addressing challenges of complex contexts and multiple modalities.
Contribution
It proposes a new prompting strategy with multimodal inputs and a boundary-perceptive approach, enabling LLMs to effectively perform TSG in long videos, which was previously challenging.
Findings
Achieves state-of-the-art performance on long video TSG tasks.
Demonstrates the effectiveness of multimodal prompting for complex temporal reasoning.
Shows significant improvements over existing short-video-focused methods.
Abstract
Temporal Sentence Grounding (TSG), which aims to localize moments from videos based on the given natural language queries, has attracted widespread attention. Existing works are mainly designed for short videos, failing to handle TSG in long videos, which poses two challenges: i) complicated contexts in long videos require temporal reasoning over longer moment sequences, and ii) multiple modalities including textual speech with rich information require special designs for content understanding in long videos. To tackle these challenges, in this work we propose a Grounding-Prompter method, which is capable of conducting TSG in long videos through prompting LLM with multimodal information. In detail, we first transform the TSG task and its multimodal inputs including speech and visual, into compressed task textualization. Furthermore, to enhance temporal reasoning under complicated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
MethodsSparse Evolutionary Training
