Towards Scalable Automated Alignment of LLMs: A Survey
Boxi Cao, Keming Lu, Xinyu Lu, Jiawei Chen, Mengjie Ren, Hao Xiang,, Peilin Liu, Yaojie Lu, Ben He, Xianpei Han, Le Sun, Hongyu Lin, Bowen Yu

TL;DR
This survey reviews emerging automated alignment methods for large language models, emphasizing scalability and effectiveness as LLM capabilities surpass human performance, and categorizes approaches based on alignment signal sources.
Contribution
It systematically categorizes recent automated alignment techniques, analyzes their mechanisms, and discusses future directions for scalable, automated LLM alignment.
Findings
Automated alignment methods are categorized into four main types.
Current approaches show promise but face challenges in scalability and reliability.
Understanding underlying mechanisms is key to advancing automated alignment.
Abstract
Alignment is the most critical step in building large language models (LLMs) that meet human needs. With the rapid development of LLMs gradually surpassing human capabilities, traditional alignment methods based on human-annotation are increasingly unable to meet the scalability demands. Therefore, there is an urgent need to explore new sources of automated alignment signals and technical approaches. In this paper, we systematically review the recently emerging methods of automated alignment, attempting to explore how to achieve effective, scalable, automated alignment once the capabilities of LLMs exceed those of humans. Specifically, we categorize existing automated alignment methods into 4 major categories based on the sources of alignment signals and discuss the current status and potential development of each category. Additionally, we explore the underlying mechanisms that enable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Digital Rights Management and Security
