Socialized Learning and Emergent Behaviors in Multi-Agent Systems based on Multimodal Large Language Models
Sureyya Akin, Shruti T. Tiwari, Ram Bhattacharya, Sagar A. Raman, Kiran Mohanty, Sita Krishnan

TL;DR
This paper presents M-S2L, a framework that combines multimodal perception and social learning to enable AI agents to develop emergent social behaviors and improve collaborative problem-solving in complex tasks.
Contribution
It introduces a novel multimodal socialized learning framework that integrates vision, text, and social learning pathways with reinforcement learning for multi-agent collaboration.
Findings
M-S2L agents outperform baselines in task completion and efficiency.
Emergence of communication protocols combining visual pointers and text.
Agents demonstrate shared awareness, role specialization, and adaptive problem-solving.
Abstract
This search introduces the Multimodal Socialized Learning Framework (M-S2L), designed to foster emergent social intelligence in AI agents by integrating Multimodal Large Language Models (M-LLMs) with social learning mechanisms. The framework equips agents with multimodal perception (vision and text) and structured action capabilities, enabling physical manipulation and grounded multimodal communication (e.g., text with visual pointers). M-S2L combines direct reinforcement learning with two novel social learning pathways: multimodal observational learning and communication-driven learning from feedback, augmented by an episodic memory system for long-term social context. We evaluate M-S2L in a Collaborative Assembly Environment (CAE), where agent teams must construct complex devices from ambiguous blueprints under informational asymmetry. Across tasks of increasing complexity, M-S2L…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution · Multimodal Machine Learning Applications · Speech and dialogue systems
