Learning to Learn from Multimodal Experience

Xingyu Sui; Weixiang Zhao; Yongxin Tang; Yanyan Zhao; Yang Wu; Dandan Tu; Bing Qin

arXiv:2605.16857·cs.AI·May 19, 2026

Learning to Learn from Multimodal Experience

Xingyu Sui, Weixiang Zhao, Yongxin Tang, Yanyan Zhao, Yang Wu, Dandan Tu, Bing Qin

PDF

TL;DR

This paper introduces a novel paradigm where agents learn to adaptively structure and utilize multimodal experience, improving performance and generalization in complex environments.

Contribution

It proposes a learnable memory framework that dynamically organizes multimodal experience based on task needs, moving beyond fixed memory schemas.

Findings

01

Adaptive memory design improves agent performance across tasks

02

Learned memory structures enhance generalization in multimodal environments

03

Dynamic memory organization outperforms static schemas

Abstract

Experience-driven learning has emerged as a promising paradigm for enabling agents to improve from interaction trajectories by accumulating and reusing past experience. However, existing approaches are predominantly developed in textual settings and rely on manually designed memory schemas, limiting their applicability to multimodal environments. In real-world scenarios, experience is inherently multimodal, involving heterogeneous signals across perception, reasoning, and action, which makes effective memory design significantly more challenging. In particular, the optimal way to structure and utilize multimodal experience is highly task-dependent and evolves over time, rendering fixed memory designs insufficient. In this work, we propose a new paradigm, learning to learn from multimodal experience, which shifts memory design from a predefined component to an adaptive and learnable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.