Training Language Models to Win Debates with Self-Play Improves Judge   Accuracy

Samuel Arnesen; David Rein; Julian Michael

arXiv:2409.16636·cs.CL·September 26, 2024

Training Language Models to Win Debates with Self-Play Improves Judge Accuracy

Samuel Arnesen, David Rein, Julian Michael

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that training language models through self-play debates enhances their ability to evaluate and judge other models more accurately, especially in complex comprehension tasks, compared to non-debate approaches.

Contribution

It introduces a debate-based training method for language models that improves their evaluative accuracy and argument quality in complex tasks.

Findings

01

Debate-trained models outperform non-debate models in judging accuracy.

02

Debate training leads to more informative and stronger arguments.

03

Debate approach shows promise for supervising difficult tasks.

Abstract

We test the robustness of debate as a method of scalable oversight by training models to debate with data generated via self-play. In a long-context reading comprehension task, we find that language model based evaluators answer questions more accurately when judging models optimized to win debates. By contrast, we find no such relationship for consultancy models trained to persuade a judge without an opposing debater present. In quantitative and qualitative comparisons between our debate models and novel consultancy baselines, we find evidence that debate training encourages stronger and more informative arguments, showing promise that it can help provide high-quality supervision for tasks that are difficult to directly evaluate.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

samuelarnesen/nyu-debate-modeling
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law