Large Language Model Alignment: A Survey

Tianhao Shen; Renren Jin; Yufei Huang; Chuang Liu; Weilong Dong,; Zishan Guo; Xinwei Wu; Yan Liu; Deyi Xiong

arXiv:2309.15025·cs.CL·September 27, 2023·34 cites

Large Language Model Alignment: A Survey

Tianhao Shen, Renren Jin, Yufei Huang, Chuang Liu, Weilong Dong,, Zishan Guo, Xinwei Wu, Yan Liu, Deyi Xiong

PDF

Open Access 1 Video

TL;DR

This survey comprehensively reviews methods for aligning large language models with human values, discussing techniques, challenges, benchmarks, and future research directions to ensure safer and more reliable AI systems.

Contribution

It categorizes existing alignment methods into outer and inner alignment, and explores interpretability, vulnerabilities, and evaluation benchmarks for LLMs.

Findings

01

Overview of alignment techniques and their categorization

02

Discussion of interpretability and adversarial vulnerabilities

03

Summary of benchmarks and evaluation methodologies

Abstract

Recent years have witnessed remarkable progress made in large language models (LLMs). Such advancements, while garnering significant attention, have concurrently elicited various concerns. The potential of these models is undeniably vast; however, they may yield texts that are imprecise, misleading, or even detrimental. Consequently, it becomes paramount to employ alignment techniques to ensure these models to exhibit behaviors consistent with human values. This survey endeavors to furnish an extensive exploration of alignment methodologies designed for LLMs, in conjunction with the extant capability research in this domain. Adopting the lens of AI alignment, we categorize the prevailing methods and emergent proposals for the alignment of LLMs into outer and inner alignment. We also probe into salient issues including the models' interpretability, and potential vulnerabilities to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Manhattan Project for AI Safety [Connor Leahy]· youtube

Taxonomy

TopicsSoftware Engineering Research · Topic Modeling