Moment Sampling in Video LLMs for Long-Form Video QA

Mustafa Chasmai; Gauri Jagatap; Gouthaman KV; Grant Van Horn; Subhransu Maji; Andrea Fanelli

arXiv:2507.00033·cs.CV·July 2, 2025

Moment Sampling in Video LLMs for Long-Form Video QA

Mustafa Chasmai, Gauri Jagatap, Gouthaman KV, Grant Van Horn, Subhransu Maji, Andrea Fanelli

PDF

Open Access

TL;DR

This paper introduces 'moment sampling', a novel, model-agnostic method that improves long-form VideoQA by selecting the most relevant frames based on question context, enhancing reasoning and efficiency.

Contribution

We propose a general-purpose, lightweight moment retrieval-guided frame sampling method to better select relevant frames for long-form VideoQA tasks.

Findings

01

Improves accuracy on four long-form VideoQA datasets

02

Reduces redundant frame processing and computational costs

03

Enhances reasoning capabilities in Video LLMs

Abstract

Recent advancements in video large language models (Video LLMs) have significantly advanced the field of video question answering (VideoQA). While existing methods perform well on short videos, they often struggle with long-range reasoning in longer videos. To scale Video LLMs for longer video content, frame sub-sampling (selecting frames at regular intervals) is commonly used. However, this approach is suboptimal, often leading to the loss of crucial frames or the inclusion of redundant information from multiple similar frames. Missing key frames impairs the model's ability to answer questions accurately, while redundant frames lead the model to focus on irrelevant video segments and increase computational resource consumption. In this paper, we investigate the use of a general-purpose text-to-video moment retrieval model to guide the frame sampling process. We propose "moment…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning