LLM-powered Query Expansion for Enhancing Boundary Prediction in Language-driven Action Localization
Zirui Shang, Xinxiao Wu, Shuo Yang

TL;DR
This paper introduces a novel approach using large language models to expand queries with boundary details, improving the accuracy and robustness of language-driven action localization in videos.
Contribution
It proposes a boundary-aware query expansion method with semantic similarity modeling to reduce boundary uncertainty and enhance training stability.
Findings
Improved boundary prediction accuracy across multiple datasets.
Enhanced training stability with boundary probability modeling.
Method is compatible with existing localization models.
Abstract
Language-driven action localization in videos requires not only semantic alignment between language query and video segment, but also prediction of action boundaries. However, the language query primarily describes the main content of an action and usually lacks specific details of action start and end boundaries, which increases the subjectivity of manual boundary annotation and leads to boundary uncertainty in training data. In this paper, on one hand, we propose to expand the original query by generating textual descriptions of the action start and end boundaries through LLMs, which can provide more detailed boundary cues for localization and thus reduce the impact of boundary uncertainty. On the other hand, to enhance the tolerance to boundary uncertainty during training, we propose to model probability scores of action boundaries by calculating the semantic similarities…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Human Pose and Action Recognition · Online Learning and Analytics
