MMA: Multimodal Memory Agent
Yihao Lu, Wanru Cheng, Zeyu Zhang, Hao Tang

TL;DR
The paper introduces MMA, a multimodal memory agent that dynamically assesses memory reliability to improve decision-making, and presents MMA-Bench, a benchmark for evaluating belief dynamics and visual biases.
Contribution
MMA incorporates a novel reliability scoring mechanism for memory retrieval and introduces MMA-Bench for structured evaluation of belief and visual biases.
Findings
MMA reduces variance and improves utility on FEVER.
MMA enhances safety and accuracy on LoCoMo.
MMA achieves 41.18% Type-B accuracy on MMA-Bench.
Abstract
Long-horizon multimodal agents depend on external memory; however, similarity-based retrieval often surfaces stale, low-credibility, or conflicting items, which can trigger overconfident errors. We propose Multimodal Memory Agent (MMA), which assigns each retrieved memory item a dynamic reliability score by combining source credibility, temporal decay, and conflict-aware network consensus, and uses this signal to reweight evidence and abstain when support is insufficient. We also introduce MMA-Bench, a programmatically generated benchmark for belief dynamics with controlled speaker reliability and structured text-vision contradictions. Using this framework, we uncover the "Visual Placebo Effect", revealing how RAG-based agents inherit latent visual biases from foundation models. On FEVER, MMA matches baseline accuracy while reducing variance by 35.2% and improving selective utility; on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Neurobiology of Language and Bilingualism
