FreshMem: Brain-Inspired Frequency-Space Hybrid Memory for Streaming Video Understanding
Kangcong Li, Peng Ye, Lin Zhang, Chao Wang, Huafeng Qin, Tao Chen

TL;DR
FreshMem introduces a brain-inspired hybrid memory system for streaming video understanding, effectively balancing short-term detail retention with long-term coherence, and outperforms existing methods without additional training.
Contribution
The paper presents FreshMem, a novel frequency-space hybrid memory architecture inspired by brain perception, combining frequency coefficients and episodic clustering to improve streaming video comprehension.
Findings
Significantly improves baseline performance on multiple streaming benchmarks.
Outperforms several fine-tuned methods without additional training.
Provides an efficient, training-free solution for long-horizon video understanding.
Abstract
Transitioning Multimodal Large Language Models (MLLMs) from offline to online streaming video understanding is essential for continuous perception. However, existing methods lack flexible adaptivity, leading to irreversible detail loss and context fragmentation. To resolve this, we propose FreshMem, a Frequency-Space Hybrid Memory network inspired by the brain's logarithmic perception and memory consolidation. FreshMem reconciles short-term fidelity with long-term coherence through two synergistic modules: Multi-scale Frequency Memory (MFM), which projects overflowing frames into representative frequency coefficients, complemented by residual details to reconstruct a global historical "gist"; and Space Thumbnail Memory (STM), which discretizes the continuous stream into episodic clusters by employing an adaptive compression strategy to distill them into high-density space thumbnails.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning
