RynnEC: Bringing MLLMs into Embodied World

Ronghao Dang; Yuqian Yuan; Yunxuan Mao; Kehan Li; Jiangpin Liu; Zhikai Wang; Xin Li; Fan Wang; Deli Zhao

arXiv:2508.14160·cs.CV·November 19, 2025

RynnEC: Bringing MLLMs into Embodied World

Ronghao Dang, Yuqian Yuan, Yunxuan Mao, Kehan Li, Jiangpin Liu, Zhikai Wang, Xin Li, Fan Wang, Deli Zhao

PDF

Open Access 3 Models 1 Datasets

TL;DR

RynnEC is a novel multimodal large language model that enhances embodied cognition in agents by integrating region-level video interaction, achieving state-of-the-art understanding and reasoning capabilities.

Contribution

It introduces RynnEC, a compact, region-centric video model with a new benchmark, advancing embodied cognition and general-purpose cognitive core development.

Findings

01

State-of-the-art performance in object property understanding

02

Effective object segmentation and spatial reasoning

03

A new egocentric video pipeline for data generation

Abstract

We introduce RynnEC, a video multimodal large language model designed for embodied cognition. Built upon a general-purpose vision-language foundation model, RynnEC incorporates a region encoder and a mask decoder, enabling flexible region-level video interaction. Despite its compact architecture, RynnEC achieves state-of-the-art performance in object property understanding, object segmentation, and spatial reasoning. Conceptually, it offers a region-centric video paradigm for the brain of embodied agents, providing fine-grained perception of the physical world and enabling more precise interactions. To mitigate the scarcity of annotated 3D datasets, we propose an egocentric video based pipeline for generating embodied cognition data. Furthermore, we introduce RynnEC-Bench, a region-centered benchmark for evaluating embodied cognitive capabilities. We anticipate that RynnEC will advance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Alibaba-DAMO-Academy/RynnEC-Bench
dataset· 158 dl
158 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Natural Language Processing Techniques