Principle-Driven Self-Alignment of Language Models from Scratch with   Minimal Human Supervision

Zhiqing Sun; Yikang Shen; Qinhong Zhou; Hongxin Zhang; Zhenfang Chen,; David Cox; Yiming Yang; Chuang Gan

arXiv:2305.03047·cs.LG·December 5, 2023·63 cites

Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision

Zhiqing Sun, Yikang Shen, Qinhong Zhou, Hongxin Zhang, Zhenfang Chen,, David Cox, Yiming Yang, Chuang Gan

PDF

Open Access 1 Repo 1 Datasets 1 Video

TL;DR

This paper introduces SELF-ALIGN, a principle-driven self-alignment method for large language models that minimizes human supervision by using synthetic data, in-context learning, and fine-tuning, resulting in an AI assistant surpassing state-of-the-art models.

Contribution

The paper presents a novel self-alignment approach combining principles and generative LLMs, reducing human supervision and achieving superior performance.

Findings

01

Dromedary outperforms Text-Davinci-003 and Alpaca on benchmarks.

02

Fewer than 300 lines of human annotations used.

03

Effective self-alignment with minimal human input.

Abstract

Recent AI-assistant agents, such as ChatGPT, predominantly rely on supervised fine-tuning (SFT) with human annotations and reinforcement learning from human feedback (RLHF) to align the output of large language models (LLMs) with human intentions, ensuring they are helpful, ethical, and reliable. However, this dependence can significantly constrain the true potential of AI-assistant agents due to the high cost of obtaining human supervision and the related issues on quality, reliability, diversity, self-consistency, and undesirable biases. To address these challenges, we propose a novel approach called SELF-ALIGN, which combines principle-driven reasoning and the generative power of LLMs for the self-alignment of AI agents with minimal human supervision. Our approach encompasses four stages: first, we use an LLM to generate synthetic prompts, and a topic-guided method to augment the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

IBM/Dromedary
pytorchOfficial

Datasets

zhiqings/dromedary-65b-verbose-clone-v0
dataset· 16 dl
16 dl

Videos

Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsALIGN · Balanced Selection