MuSEAgent: A Multimodal Reasoning Agent with Stateful Experiences

Shijian Wang; Jiarui Jin; Runhao Fu; Zexuan Yan; Xingjian Wang; Mengkang Hu; Eric Wang; Xiaoxi Li; Kangning Zhang; Li Yao; Wenxiang Jiao; Xuelian Cheng; Yuan Lu; Zongyuan Ge

arXiv:2603.27813·cs.CV·March 31, 2026

MuSEAgent: A Multimodal Reasoning Agent with Stateful Experiences

Shijian Wang, Jiarui Jin, Runhao Fu, Zexuan Yan, Xingjian Wang, Mengkang Hu, Eric Wang, Xiaoxi Li, Kangning Zhang, Li Yao, Wenxiang Jiao, Xuelian Cheng, Yuan Lu, Zongyuan Ge

PDF

1 Repo

TL;DR

MuSEAgent is a multimodal reasoning agent that uses a novel stateful experience learning paradigm to improve decision-making across visual and textual sources, outperforming existing retrieval methods.

Contribution

It introduces a stateful experience learning framework with a quality-filtered experience bank for adaptive multimodal reasoning, advancing beyond trajectory-level retrieval.

Findings

01

MuSEAgent outperforms trajectory-level experience retrieval baselines.

02

The approach improves performance on fine-grained visual perception tasks.

03

The method enhances complex multimodal reasoning capabilities.

Abstract

Research agents have recently achieved significant progress in information seeking and synthesis across heterogeneous textual and visual sources. In this paper, we introduce MuSEAgent, a multimodal reasoning agent that enhances decision-making by extending the capabilities of research agents to discover and leverage stateful experiences. Rather than relying on trajectory-level retrieval, we propose a stateful experience learning paradigm that abstracts interaction data into atomic decision experiences through hindsight reasoning. These experiences are organized into a quality-filtered experience bank that supports policy-driven experience retrieval at inference time. Specifically, MuSEAgent enables adaptive experience exploitation through complementary wide- and deep-search strategies, allowing the agent to dynamically retrieve multimodal guidance across diverse compositional semantic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

deepexperience/MuSEAgent
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.