MMA: Multimodal Memory Agent

Yihao Lu; Wanru Cheng; Zeyu Zhang; Hao Tang

arXiv:2602.16493·cs.CV·February 19, 2026

MMA: Multimodal Memory Agent

Yihao Lu, Wanru Cheng, Zeyu Zhang, Hao Tang

PDF

Open Access

TL;DR

The paper introduces MMA, a multimodal memory agent that dynamically assesses memory reliability to improve decision-making, and presents MMA-Bench, a benchmark for evaluating belief dynamics and visual biases.

Contribution

MMA incorporates a novel reliability scoring mechanism for memory retrieval and introduces MMA-Bench for structured evaluation of belief and visual biases.

Findings

01

MMA reduces variance and improves utility on FEVER.

02

MMA enhances safety and accuracy on LoCoMo.

03

MMA achieves 41.18% Type-B accuracy on MMA-Bench.

Abstract

Long-horizon multimodal agents depend on external memory; however, similarity-based retrieval often surfaces stale, low-credibility, or conflicting items, which can trigger overconfident errors. We propose Multimodal Memory Agent (MMA), which assigns each retrieved memory item a dynamic reliability score by combining source credibility, temporal decay, and conflict-aware network consensus, and uses this signal to reweight evidence and abstain when support is insufficient. We also introduce MMA-Bench, a programmatically generated benchmark for belief dynamics with controlled speaker reliability and structured text-vision contradictions. Using this framework, we uncover the "Visual Placebo Effect", revealing how RAG-based agents inherit latent visual biases from foundation models. On FEVER, MMA matches baseline accuracy while reducing variance by 35.2% and improving selective utility; on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Neurobiology of Language and Bilingualism