See More, Store Less: Memory-Efficient Resolution for Video Moment Retrieval

Mingyu Jeon; Sungjin Han; Jinkwon Hwang; Minchol Kwon; Jonghee Kim; Junyeong Kim

arXiv:2601.09350·cs.CV·January 15, 2026

See More, Store Less: Memory-Efficient Resolution for Video Moment Retrieval

Mingyu Jeon, Sungjin Han, Jinkwon Hwang, Minchol Kwon, Jonghee Kim, Junyeong Kim

PDF

Open Access 1 Video

TL;DR

SMORE is a novel framework that improves memory efficiency in video moment retrieval by using query-guided encoding, importance modulation, and adaptive frame compression, achieving state-of-the-art results.

Contribution

Introduces SMORE, a memory-efficient video retrieval method that preserves key information through query-guided encoding and adaptive compression, outperforming existing approaches.

Findings

01

Achieves state-of-the-art performance on multiple benchmarks.

02

Effectively balances memory usage and information retention.

03

Enhances video understanding without exceeding memory budgets.

Abstract

Recent advances in Multimodal Large Language Models (MLLMs) have improved image recognition and reasoning, but video-related tasks remain challenging due to memory constraints from dense frame processing. Existing Video Moment Retrieval (VMR) methodologies rely on sparse frame sampling, risking potential information loss, especially in lengthy videos. We propose SMORE (See MORE, store less), a framework that enhances memory efficiency while maintaining high information resolution. SMORE (1) uses query-guided captions to encode semantics aligned with user intent, (2) applies query-aware importance modulation to highlight relevant segments, and (3) adaptively compresses frames to preserve key content while reducing redundancy. This enables efficient video understanding without exceeding memory budgets. Experimental validation reveals that SMORE achieves state-of-the-art performance on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

See More, Store Less: Memory-Efficient Resolution for Video Moment Retrieval· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Video Analysis and Summarization