Towards Retrieval Augmented Generation over Large Video Libraries

Yannis Tevissen; Khalil Guetari; Fr\'ed\'eric Petitpont

arXiv:2406.14938·cs.CL·June 24, 2024

Towards Retrieval Augmented Generation over Large Video Libraries

Yannis Tevissen, Khalil Guetari, Fr\'ed\'eric Petitpont

PDF

Open Access

TL;DR

This paper introduces Video Library Question Answering (VLQA), an architecture that combines retrieval and generation techniques to enable efficient querying and content creation over large video libraries using large language models.

Contribution

It presents a novel interoperable system that retrieves relevant video segments using speech and visual metadata and generates precise answers with timestamps, advancing multimedia retrieval.

Findings

01

Demonstrates effective retrieval of relevant video moments

02

Integrates LLMs for query generation and answer synthesis

03

Shows potential for AI-assisted video content creation

Abstract

Video content creators need efficient tools to repurpose content, a task that often requires complex manual or automated searches. Crafting a new video from large video libraries remains a challenge. In this paper we introduce the task of Video Library Question Answering (VLQA) through an interoperable architecture that applies Retrieval Augmented Generation (RAG) to video libraries. We propose a system that uses large language models (LLMs) to generate search queries, retrieving relevant video moments indexed by speech and visual metadata. An answer generation module then integrates user queries with this metadata to produce responses with specific video timestamps. This approach shows promise in multimedia content retrieval, and AI-assisted video content creation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Video Analysis and Summarization · Multimodal Machine Learning Applications

MethodsLib