Teaching Models to Improve on Tape
Liat Bezalel, Eyal Orgad, Amir Globerson

TL;DR
This paper introduces CORGI, a reinforcement learning framework that enhances large language models' ability to generate content satisfying specific constraints through simulated interactions and feedback.
Contribution
The paper presents CORGI, a novel RL-based training method that improves LLMs' constrained generation by leveraging simulated interactions and feedback, enabling better generalization.
Findings
CORGI outperforms baseline RL methods without conversational feedback.
CORGI enables meta-learning for better generalization to new tasks.
Conversational optimization with RL significantly improves controlled generation.
Abstract
Large Language Models (LLMs) often struggle when prompted to generate content under specific constraints. However, in such cases it is often easy to check whether these constraints are satisfied or violated. Recent works have shown that LLMs can benefit from such "corrective feedback". Here we claim that this skill of LLMs can be significantly enhanced via training. We introduce an RL framework for teaching models to use such rewards, by simulating interaction sessions, and rewarding the model according to its ability to satisfy the constraints. We refer to our method as CORGI (Controlled Generation with RL for Guided Interaction), and evaluate it on a variety of controlled generation tasks using unlabeled training data. We find that CORGI consistently outperforms the baseline reinforcement learning method that does not incorporate conversational feedback. Furthermore, CORGI's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTeaching and Learning Programming
