TL;DR
This paper investigates problem drift in multi-agent debate with large language models, analyzing its causes, prevalence, and proposing methods to detect and mitigate it to improve task performance.
Contribution
It introduces the concept of problem drift, quantifies its occurrence across tasks, and proposes DRIFTJudge and DRIFTPolicy to detect and reduce drift in multi-agent debates.
Findings
Problem drift occurs in 76-89% of generative tasks due to subjective answer spaces.
Eight experts analyzed 170 debates to identify causes of drift, including lack of progress and low-quality feedback.
DRIFTPolicy mitigates 31% of problem drift cases.
Abstract
Multi-agent debate - multiple instances of large language models discussing problems in turn-based interaction - has shown promise for solving knowledge and reasoning tasks. However, these methods show limitations when solving complex problems that require longer reasoning chains. We analyze how multi-agent debate drifts away from the initial problem over multiple turns, thus harming task performance. We define this phenomenon as problem drift and quantify its presence across ten tasks (i.e., three generative, three knowledge, three reasoning, and one instruction-following task). We find that generative tasks drift often due to the subjectivity of the answer space (76-89%), compared to high-complexity tasks (7-21%). To identify the reasons, eight human experts analyze 170 multi-agent debates suffering from problem drift. We find the most common issues related to this drift are the lack…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
