Towards Scalable Automated Alignment of LLMs: A Survey

Boxi Cao; Keming Lu; Xinyu Lu; Jiawei Chen; Mengjie Ren; Hao Xiang,; Peilin Liu; Yaojie Lu; Ben He; Xianpei Han; Le Sun; Hongyu Lin; Bowen Yu

arXiv:2406.01252·cs.CL·September 4, 2024

Towards Scalable Automated Alignment of LLMs: A Survey

Boxi Cao, Keming Lu, Xinyu Lu, Jiawei Chen, Mengjie Ren, Hao Xiang,, Peilin Liu, Yaojie Lu, Ben He, Xianpei Han, Le Sun, Hongyu Lin, Bowen Yu

PDF

Open Access 1 Repo

TL;DR

This survey reviews emerging automated alignment methods for large language models, emphasizing scalability and effectiveness as LLM capabilities surpass human performance, and categorizes approaches based on alignment signal sources.

Contribution

It systematically categorizes recent automated alignment techniques, analyzes their mechanisms, and discusses future directions for scalable, automated LLM alignment.

Findings

01

Automated alignment methods are categorized into four main types.

02

Current approaches show promise but face challenges in scalability and reliability.

03

Understanding underlying mechanisms is key to advancing automated alignment.

Abstract

Alignment is the most critical step in building large language models (LLMs) that meet human needs. With the rapid development of LLMs gradually surpassing human capabilities, traditional alignment methods based on human-annotation are increasingly unable to meet the scalability demands. Therefore, there is an urgent need to explore new sources of automated alignment signals and technical approaches. In this paper, we systematically review the recently emerging methods of automated alignment, attempting to explore how to achieve effective, scalable, automated alignment once the capabilities of LLMs exceed those of humans. Specifically, we categorize existing automated alignment methods into 4 major categories based on the sources of alignment signals and discuss the current status and potential development of each category. Additionally, we explore the underlying mechanisms that enable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cascip/awesome-auto-alignment
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Digital Rights Management and Security