Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration
Yang Zhang, Shixin Yang, Chenjia Bai, Fei Wu, Xiu Li, Zhen Wang, Xuelong Li

TL;DR
This paper introduces ReAd, a reinforcement learning-inspired framework that enhances multi-agent collaboration with large language models by improving efficiency and success rates through self-refinement of plans.
Contribution
It proposes a novel ReAd framework that uses advantage-weighted regression for efficient LLM grounding in multi-agent tasks, reducing queries and interactions.
Findings
ReAd outperforms baselines in success rate.
ReAd significantly reduces agent interactions.
ReAd decreases LLM query rounds.
Abstract
Grounding the reasoning ability of large language models (LLMs) for embodied tasks is challenging due to the complexity of the physical world. Especially, LLM planning for multi-agent collaboration requires communication of agents or credit assignment as the feedback to re-adjust the proposed plans and achieve effective coordination. However, existing methods that overly rely on physical verification or self-reflection suffer from excessive and inefficient querying of LLMs. In this paper, we propose a novel framework for multi-agent collaboration that introduces Reinforced Advantage feedback (ReAd) for efficient self-refinement of plans. Specifically, we perform critic regression to learn a sequential advantage function from LLM-planned data, and then treat the LLM planner as an optimizer to generate actions that maximize the advantage function. It endows the LLM with the foresight to…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
S1: The method introduces a method to eliminate the need for direct physical interaction with the environment, instead utilizing a pre-trained critic to approximate an advantage score and internally replan the action. S2: The authors provide an extensive theoretical justification and guarantee for advantage decomposition in the multi-agent setting and empirically demonstrate the success of their method on two different benchmarks in multiple settings S3: Their method is able to generate dire
W1: Lack of discussion about generalization to unseen partners. In practical coordination scenarios, partner agents might not employ the same algorithms as the ego-agent. It is essential for coordination methods to be robust to unseen partners. W2: Limited test coverage on Overcooked AI benchmark - it looks like Reflexion and React have 0 % task completion rate. Experiments should consider more time steps as it might turn out that although Reflexion or React take more time, they could have a hi
The paper presents an innovative approach by leveraging MARL theories to enhance collaborative behavior in LLM-based agents within multi-agent environments. The theoretical foundation is well-developed, allowing readers to gain a deep understanding of the approach. Furthermore, the experimental results and their detailed analysis contribute valuable insights to the research community.
- Since the paper takes insights from marl theories, including some marl methods as a baseline will further strengthen the robustness of the results - Even with the _Difficult Variants_ of RoCoBench, the proposed method has achieved a nearly 100 percent success rate, so it seems the task may still be too easy or oversimplified. - The experimental results showing a 0% success rate for all baselines on the overcooked-ai benchmark need further explanation. The original overcooked-ai paper reporte
The idea of incorporating some critic in the LLM planner is promising and can mitigate hallucination among the LLMs. It's an interesting and important problem for the AI/LLM community. The paper is well written and easy to understand. The experiment is well done and the analysis is thorough and detailed. I particularly appreciate the extensive discussions and analyses presented in the appendix, as they openly address the model's robustness and limitations. This level of detail not only streng
1. Model: The main weakness of the proposed algorithm is its limited generalizability. It seems like the critic NN is trained over hundreds of plans/trajectories in each domain. If this is the case, then this model becomes very domain-specific and would need a new critic for each new domain. That defeats the purpose of using LLM as a planner, which is meant to be fast and easily adaptable to different domains so you don't have to build any domain-specific model. I wonder if you can use LLM (mayb
1. Grounding LLM's reasoning prowess in physical environments is a highly promising and crucial field that can help leverage continuously evolving LLMs for real-world problem solving. 2. Addressing interaction costs in embodied environments is a challenging yet practical problem that can resolve various factors, including safety issues and LLM inference costs. 3. The paper is well-organized and written clearly without confusion, making it easily accessible to readers.
1. The main concern is the novelty of the contribution. Applying MARL's theoretical aspects to embodied multi-agent collaboration appears quite straightforward, and seems independent from the authors' stated goal (as written in Line 77) of enhancing LLMs' reasoning capabilities. 2. While the introduction addresses interaction efficiency as a major issue in prior work, the data collection efficiency of the proposed learning method should also be discussed. This TRPO-style on-policy learning appr
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Multi-Agent Systems and Negotiation
