TL;DR
This paper introduces a novel learning algorithm called Interruptible Collaborative Roleplayer (ICR) that trains AI agents to be more effective partners in multi-party collaboration by increasing group common ground and effectively handling interventions.
Contribution
The paper presents ICR, a new partner-aware learning algorithm that improves collaborative behavior of LLM agents in multi-party tasks, addressing limitations of standard RLHF training.
Findings
ICR outperforms standard methods in promoting common ground convergence.
ICR enables agents to explore more diverse solutions.
Standard RLHF-trained agents tend to ignore well-meaning interventions.
Abstract
Large Language Models (LLMs) are increasingly being deployed in agentic settings where they act as collaborators with humans. Therefore, it is increasingly important to be able to evaluate their abilities to collaborate effectively in multi-turn, multi-party tasks. In this paper, we build on the AI alignment and safe interruptibility literature to offer novel theoretical insights on collaborative behavior between LLM-driven collaborator agents and an intervention agent. Our goal is to learn an ideal partner-aware collaborator that increases the group's common-ground (CG) alignment on task-relevant propositions-by intelligently collecting information provided in interventions by a partner agent. We show how LLM agents trained using standard RLHF and related approaches are naturally inclined to ignore possibly well-meaning interventions, which makes increasing group common ground…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
