Learning to Learn from Multimodal Experience
Xingyu Sui, Weixiang Zhao, Yongxin Tang, Yanyan Zhao, Yang Wu, Dandan Tu, Bing Qin

TL;DR
This paper introduces a novel paradigm where agents learn to adaptively structure and utilize multimodal experience, improving performance and generalization in complex environments.
Contribution
It proposes a learnable memory framework that dynamically organizes multimodal experience based on task needs, moving beyond fixed memory schemas.
Findings
Adaptive memory design improves agent performance across tasks
Learned memory structures enhance generalization in multimodal environments
Dynamic memory organization outperforms static schemas
Abstract
Experience-driven learning has emerged as a promising paradigm for enabling agents to improve from interaction trajectories by accumulating and reusing past experience. However, existing approaches are predominantly developed in textual settings and rely on manually designed memory schemas, limiting their applicability to multimodal environments. In real-world scenarios, experience is inherently multimodal, involving heterogeneous signals across perception, reasoning, and action, which makes effective memory design significantly more challenging. In particular, the optimal way to structure and utilize multimodal experience is highly task-dependent and evolves over time, rendering fixed memory designs insufficient. In this work, we propose a new paradigm, learning to learn from multimodal experience, which shifts memory design from a predefined component to an adaptive and learnable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
