Sensorimotor Self-Recognition in Multimodal Large Language Model-Driven Robots
I\~naki Dellibarda Varela, Pablo Romero-Sorozabal, Diego Torricelli, Gabriel Delgado-Oleas, Jose Ignacio Serrano, Maria Dolores del Castillo Sobrino, Eduardo Rocon, Manuel Cebrian

TL;DR
This paper explores how multimodal large language models integrated into robots can develop self-recognition abilities through sensorimotor experiences, advancing autonomous and self-aware AI systems.
Contribution
It demonstrates that multimodal LLMs can achieve self-recognition in robots by integrating sensory data and memory, revealing the underlying mechanisms of minimal self-awareness.
Findings
Robust environmental awareness and self-identification in robots with multimodal LLMs.
Sensory integration influences dimensions of the minimal self and their coordination.
Structured and episodic memory are essential for self-recognition, confirmed by ablation tests.
Abstract
Self-recognition -- the ability to maintain an internal representation of one's own body within the environment -- underpins intelligent, autonomous behavior. As a foundational component of the minimal self, self-recognition provides the initial substrate from which higher forms of self-awareness may eventually emerge. Recent advances in large language models achieve human-like performance in tasks integrating multimodal information, raising growing interest in the embodiment capabilities of AI agents deployed on nonhuman platforms such as robots. We investigate whether multimodal LLMs can develop self-recognition through sensorimotor experience by integrating an LLM into an autonomous mobile robot. The system exhibits robust environmental awareness, self-identification, and predictive awareness, enabling it to infer its robotic nature and motion characteristics. Structural equation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
