EMAC+: Embodied Multimodal Agent for Collaborative Planning with VLM+LLM

Shuang Ao; Flora D. Salim; Simon Khan

arXiv:2505.19905·cs.AI·October 17, 2025

EMAC+: Embodied Multimodal Agent for Collaborative Planning with VLM+LLM

Shuang Ao, Flora D. Salim, Simon Khan

PDF

Open Access 4 Reviews

TL;DR

EMAC+ is a novel embodied multimodal agent that enhances LLM-based planning in robotics by integrating visual feedback through VLM, enabling dynamic, environment-aware decision making and improved task performance.

Contribution

This work introduces EMAC+, a bidirectional training framework that allows LLMs to learn from visual interactions, addressing key limitations of prior multimodal agents in robotics.

Findings

01

Achieves superior performance on ALFWorld and RT-1 benchmarks.

02

Demonstrates robustness to noisy visual observations.

03

Enables LLMs to internalize environment dynamics through interaction.

Abstract

Although LLMs demonstrate proficiency in several text-based reasoning and planning tasks, their implementation in robotics control is constrained by significant deficiencies: (1) LLM agents are designed to work mainly with textual inputs rather than visual conditions; (2) Current multimodal agents treat LLMs as static planners, which separates their reasoning from environment dynamics, resulting in actions that do not take domain-specific knowledge into account; and (3) LLMs are not designed to learn from visual interactions, which makes it harder for them to make better policies for specific domains. In this paper, we introduce EMAC+, an Embodied Multimodal Agent that collaboratively integrates LLM and VLM via a bidirectional training paradigm. Unlike existing methods, EMAC+ dynamically refines high-level textual plans generated by an LLM using real-time feedback from a VLM executing…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 0Confidence 1

Strengths

N/A

Weaknesses

N/A

Reviewer 02Rating 0Confidence 5

Strengths

No

Weaknesses

The paper pdf is blank.

Reviewer 03Rating 0Confidence 5

Strengths

None

Weaknesses

None

Reviewer 04Rating 0Confidence 5

Strengths

N/A

Weaknesses

N/A

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Agent Systems and Negotiation · AI-based Problem Solving and Planning · Semantic Web and Ontologies