Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision

Zhiqing Sun; Longhui Yu; Yikang Shen; Weiyang Liu; Yiming Yang; Sean; Welleck; Chuang Gan

arXiv:2403.09472·cs.LG·December 11, 2024·1 cites

Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision

Zhiqing Sun, Longhui Yu, Yikang Shen, Weiyang Liu, Yiming Yang, Sean, Welleck, Chuang Gan

PDF

Open Access 1 Repo

TL;DR

This paper introduces a scalable alignment method where reward models trained on easy tasks are used to evaluate and improve performance on harder tasks, enabling AI systems to surpass human-level capabilities in complex reasoning.

Contribution

The paper proposes a novel easy-to-hard generalization approach using reward models trained on simple tasks to evaluate and enhance performance on difficult tasks, advancing AI beyond human supervision.

Findings

01

Reward models trained on easy tasks effectively evaluate harder tasks.

02

The approach achieves 34.0% accuracy on MATH500 with minimal supervision.

03

Enables AI systems to surpass human capabilities in complex reasoning.

Abstract

Current AI alignment methodologies rely on human-provided demonstrations or judgments, and the learned capabilities of AI systems would be upper-bounded by human capabilities as a result. This raises a challenging research question: How can we keep improving the systems when their capabilities have surpassed the levels of humans? This paper answers this question in the context of tackling hard reasoning tasks (e.g., level 4-5 MATH problems) via learning from human annotations on easier tasks (e.g., level 1-3 MATH problems), which we term as easy-to-hard generalization. Our key insight is that an evaluator (reward model) trained on supervisions for easier tasks can be effectively used for scoring candidate solutions of harder tasks and hence facilitating easy-to-hard generalization over different levels of tasks. Based on this insight, we propose a novel approach to scalable alignment,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

edward-sun/easy-to-hard
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques