Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
Zhiqing Sun, Yikang Shen, Qinhong Zhou, Hongxin Zhang, Zhenfang Chen,, David Cox, Yiming Yang, Chuang Gan

TL;DR
This paper introduces SELF-ALIGN, a principle-driven self-alignment method for large language models that minimizes human supervision by using synthetic data, in-context learning, and fine-tuning, resulting in an AI assistant surpassing state-of-the-art models.
Contribution
The paper presents a novel self-alignment approach combining principles and generative LLMs, reducing human supervision and achieving superior performance.
Findings
Dromedary outperforms Text-Davinci-003 and Alpaca on benchmarks.
Fewer than 300 lines of human annotations used.
Effective self-alignment with minimal human input.
Abstract
Recent AI-assistant agents, such as ChatGPT, predominantly rely on supervised fine-tuning (SFT) with human annotations and reinforcement learning from human feedback (RLHF) to align the output of large language models (LLMs) with human intentions, ensuring they are helpful, ethical, and reliable. However, this dependence can significantly constrain the true potential of AI-assistant agents due to the high cost of obtaining human supervision and the related issues on quality, reliability, diversity, self-consistency, and undesirable biases. To address these challenges, we propose a novel approach called SELF-ALIGN, which combines principle-driven reasoning and the generative power of LLMs for the self-alignment of AI agents with minimal human supervision. Our approach encompasses four stages: first, we use an LLM to generate synthetic prompts, and a topic-guided method to augment the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsALIGN · Balanced Selection
