LifelongMemory: Leveraging LLMs for Answering Queries in Long-form   Egocentric Videos

Ying Wang; Yanlai Yang; Mengye Ren

arXiv:2312.05269·cs.CV·November 7, 2024·1 cites

LifelongMemory: Leveraging LLMs for Answering Queries in Long-form Egocentric Videos

Ying Wang, Yanlai Yang, Mengye Ren

PDF

Open Access 1 Repo

TL;DR

LifelongMemory is a novel framework that uses large language models and video descriptions to answer questions about long egocentric videos, providing interpretable and confident responses.

Contribution

It introduces LifelongMemory, combining video description generation with LLM reasoning for improved long-form egocentric video question answering.

Findings

01

Achieves state-of-the-art on EgoSchema benchmark.

02

Performs competitively on Ego4D NLQ challenge.

03

Provides interpretable confidence and explanation modules.

Abstract

In this paper we introduce LifelongMemory, a new framework for accessing long-form egocentric videographic memory through natural language question answering and retrieval. LifelongMemory generates concise video activity descriptions of the camera wearer and leverages the zero-shot capabilities of pretrained large language models to perform reasoning over long-form video context. Furthermore, LifelongMemory uses a confidence and explanation module to produce confident, high-quality, and interpretable answers. Our approach achieves state-of-the-art performance on the EgoSchema benchmark for question answering and is highly competitive on the natural language query (NLQ) challenge of Ego4D. Code is available at https://github.com/agentic-learning-ai-lab/lifelong-memory.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Agentic-Learning-AI-Lab/lifelong-memory
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning