Measuring Progress on Scalable Oversight for Large Language Models
Samuel R. Bowman, Jeeyoon Hyun, Ethan Perez, Edwin Chen, Craig Pettit,, Scott Heiner, Kamil\.e Luko\v{s}i\=ut\.e, Amanda Askell, Andy Jones, Anna, Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Christopher Olah,, Daniela Amodei, Dario Amodei, Dawn Drain, Dustin Li

TL;DR
This paper explores scalable oversight for large language models by designing empirical experiments where human specialists outperform AI and unaided humans, demonstrating the potential for effective supervision of advanced AI systems.
Contribution
It introduces an experimental framework for studying scalable oversight and provides proof-of-concept results showing human-AI collaboration improves performance on complex tasks.
Findings
Humans with AI outperform AI alone and unaided humans on specific tasks.
Chat-based interactions with language models can enhance human performance.
Scalable oversight research is feasible with current large language models.
Abstract
Developing safe and useful general-purpose AI systems will require us to make progress on scalable oversight: the problem of supervising systems that potentially outperform us on most skills relevant to the task at hand. Empirical work on this problem is not straightforward, since we do not yet have systems that broadly exceed our abilities. This paper discusses one of the major ways we think about this problem, with a focus on ways it can be studied empirically. We first present an experimental design centered on tasks for which human specialists succeed but unaided humans and current general AI systems fail. We then present a proof-of-concept experiment meant to demonstrate a key feature of this experimental design and show its viability with two question-answering tasks: MMLU and time-limited QuALITY. On these tasks, we find that human participants who interact with an unreliable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
This is what happens when you let AIs debate· youtube
Taxonomy
TopicsTopic Modeling · Multi-Agent Systems and Negotiation · Speech and dialogue systems
Methodsfail
