RAPID: Retrieval-Augmented Parallel Inference Drafting for Text-Based   Video Event Retrieval

Long Nguyen; Huy Nguyen; Bao Khuu; Huy Luu; Huy Le; Tuan Nguyen; and; Tho Quan

arXiv:2501.16303·cs.CL·January 29, 2025

RAPID: Retrieval-Augmented Parallel Inference Drafting for Text-Based Video Event Retrieval

Long Nguyen, Huy Nguyen, Bao Khuu, Huy Luu, Huy Le, Tuan Nguyen, and, Tho Quan

PDF

Open Access

TL;DR

RAPID is a novel retrieval system that uses large language models to enrich and process text queries for more accurate video event retrieval, especially when queries lack context.

Contribution

The paper introduces RAPID, a retrieval-augmented system leveraging LLMs and prompt-based learning to improve text-based video event retrieval with incomplete queries.

Findings

01

RAPID outperforms traditional methods on custom datasets.

02

The system demonstrates high speed and accuracy in real-world scenarios.

03

Superior performance shown in the Ho Chi Minh City AI Challenge 2024.

Abstract

Retrieving events from videos using text queries has become increasingly challenging due to the rapid growth of multimedia content. Existing methods for text-based video event retrieval often focus heavily on object-level descriptions, overlooking the crucial role of contextual information. This limitation is especially apparent when queries lack sufficient context, such as missing location details or ambiguous background elements. To address these challenges, we propose a novel system called RAPID (Retrieval-Augmented Parallel Inference Drafting), which leverages advancements in Large Language Models (LLMs) and prompt-based learning to semantically correct and enrich user queries with relevant contextual information. These enriched queries are then processed through parallel retrieval, followed by an evaluation step to select the most relevant results based on their alignment with the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Topic Modeling · Natural Language Processing Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Focus