Aligning Large Language Models through Synthetic Feedback

Sungdong Kim; Sanghwan Bae; Jamin Shin; Soyoung Kang; Donghyun Kwak,; Kang Min Yoo; Minjoon Seo

arXiv:2305.13735·cs.CL·October 24, 2023·2 cites

Aligning Large Language Models through Synthetic Feedback

Sungdong Kim, Sanghwan Bae, Jamin Shin, Soyoung Kang, Donghyun Kwak,, Kang Min Yoo, Minjoon Seo

PDF

Open Access 1 Repo

TL;DR

This paper introduces ALMoST, a novel method for aligning large language models using synthetic feedback, reducing reliance on human annotations and proprietary models, and achieving superior alignment performance.

Contribution

The paper presents a new framework for LLM alignment using synthetic feedback for reward modeling, eliminating the need for extensive human data or proprietary models.

Findings

01

ALMoST outperforms recent open-source models in alignment benchmarks.

02

Synthetic feedback effectively guides LLM alignment without human annotations.

03

Human evaluations favor ALMoST over Alpaca and Dolly-v2.

Abstract

Aligning large language models (LLMs) to human values has become increasingly important as it enables sophisticated steering of LLMs. However, it requires significant human demonstrations and feedback or distillation from proprietary LLMs such as ChatGPT. In this work, we propose a novel alignment learning framework with synthetic feedback not dependent on extensive human annotations and proprietary LLMs. First, we perform reward modeling (RM) with synthetic feedback by contrasting responses from vanilla LLMs with various sizes and prompts. Then, we use the RM to simulate high-quality demonstrations to train a supervised policy and further optimize the model with reinforcement learning. Our resulting model, Aligned Language Model with Synthetic Training dataset (ALMoST), outperforms recent open-sourced models, which are trained on the outputs of InstructGPT or human-annotated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

naver-ai/almost
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Absolute Position Encodings · Softmax · Layer Normalization · Byte Pair Encoding · Dropout · Linear Layer · Label Smoothing · Adam