Mutual Enhancement of Large Language and Reinforcement Learning Models through Bi-Directional Feedback Mechanisms: A Planning Case Study
Shangding Gu

TL;DR
This paper introduces a bi-directional feedback framework where Large Language Models and Reinforcement Learning agents cooperatively enhance each other's performance through recursive help, improving planning and reasoning in complex tasks.
Contribution
It presents a novel teacher-student framework with recursive feedback between LLMs and RL models, enabling mutual enhancement in a cooperative multi-agent setting.
Findings
Bi-directional feedback improves task performance.
Recursive help accelerates learning and exploration.
Empirical results validate the effectiveness of the proposed method.
Abstract
Large Language Models (LLMs) have demonstrated remarkable capabilities for reinforcement learning (RL) models, such as planning and reasoning capabilities. However, the problems of LLMs and RL model collaboration still need to be solved. In this study, we employ a teacher-student learning framework to tackle these problems, specifically by offering feedback for LLMs using RL models and providing high-level information for RL models with LLMs in a cooperative multi-agent setting. Within this framework, the LLM acts as a teacher, while the RL model acts as a student. The two agents cooperatively assist each other through a process of recursive help, such as "I help you help I help." The LLM agent supplies abstract information to the RL agent, enabling efficient exploration and policy improvement. In turn, the RL agent offers feedback to the LLM agent, providing valuable, real-time…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Software Engineering Research
