Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation   for Video Moment Retrieval

Yiyang Jiang; Wengyu Zhang; Xulu Zhang; Xiaoyong Wei and; Chang Wen Chen; Qing Li

arXiv:2407.15051·cs.CV·September 17, 2024

Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval

Yiyang Jiang, Wengyu Zhang, Xulu Zhang, Xiaoyong Wei and, Chang Wen Chen, Qing Li

PDF

1 Repo

TL;DR

This paper explores using large language model encoders to incorporate prior knowledge and pseudo-events into video moment retrieval, improving inter-concept relation modeling and achieving state-of-the-art results.

Contribution

It introduces a novel framework utilizing LLM encoders for refining multimodal embeddings in VMR, overcoming limitations of decoders and transferring refinement capabilities to other embeddings.

Findings

01

LLM encoders effectively refine inter-concept relations in multimodal embeddings.

02

Refinement capabilities transfer to embeddings like BLIP and T5 with similar inter-concept patterns.

03

Achieved state-of-the-art performance in video moment retrieval.

Abstract

In this paper, we investigate the feasibility of leveraging large language models (LLMs) for integrating general knowledge and incorporating pseudo-events as priors for temporal content distribution in video moment retrieval (VMR) models. The motivation behind this study arises from the limitations of using LLMs as decoders for generating discrete textual descriptions, which hinders their direct application to continuous outputs like salience scores and inter-frame embeddings that capture inter-frame relations. To overcome these limitations, we propose utilizing LLM encoders instead of decoders. Through a feasibility study, we demonstrate that LLM encoders effectively refine inter-concept relations in multimodal embeddings, even without being trained on textual embeddings. We also show that the refinement capability of LLM encoders can be transferred to other embeddings, such as BLIP…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fletcherjiang/llmepet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsGated Linear Unit · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Byte Pair Encoding · Inverse Square Root Schedule · SentencePiece · Dropout · Contrastive Language-Image Pre-training · Layer Normalization · Linear Layer