UltraFeedback: Boosting Language Models with Scaled AI Feedback
Ganqu Cui, Lifan Yuan, Ning Ding, Guanming Yao, Bingxiang He, Wei Zhu,, Yuan Ni, Guotong Xie, Ruobing Xie, Yankai Lin, Zhiyuan Liu, Maosong Sun

TL;DR
This paper introduces UltraFeedback, a large-scale AI feedback dataset generated automatically to improve open-source language models, addressing limitations of human feedback in size and diversity, and demonstrating its effectiveness in model alignment.
Contribution
The paper presents UltraFeedback, a novel large-scale, diversified AI feedback dataset that enhances language model alignment beyond human feedback limitations.
Findings
UltraFeedback contains over 1 million GPT-4 feedback instances.
Models trained with UltraFeedback outperform baselines on chat benchmarks.
AI feedback effectively improves open-source language model alignment.
Abstract
Learning from human feedback has become a pivot technique in aligning large language models (LLMs) with human preferences. However, acquiring vast and premium human feedback is bottlenecked by time, labor, and human capability, resulting in small sizes or limited topics of current datasets. This further hinders feedback learning as well as alignment research within the open-source community. To address this issue, we explore how to go beyond human feedback and collect high-quality \textit{AI feedback} automatically for a scalable alternative. Specifically, we identify \textbf{scale and diversity} as the key factors for feedback data to take effect. Accordingly, we first broaden instructions and responses in both amount and breadth to encompass a wider range of user-assistant interactions. Then, we meticulously apply a series of techniques to mitigate annotation biases for more reliable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗yanolja/YanoljaNEXT-EEVE-Instruct-10.8Bmodel· 3.6k dl· ♡ 1653.6k dl♡ 165
- 🤗teddylee777/EEVE-Korean-Instruct-10.8B-v1.0-ggufmodel· 811 dl· ♡ 23811 dl♡ 23
- 🤗openbmb/UltraRM-13bmodel· 615 dl· ♡ 61615 dl♡ 61
- 🤗openbmb/UltraCM-13bmodel· 13 dl· ♡ 2013 dl♡ 20
- 🤗HuggingFaceH4/zephyr-7b-alphamodel· 4.0k dl· ♡ 11204.0k dl♡ 1120
- 🤗HuggingFaceH4/zephyr-7b-betamodel· 136k dl· ♡ 1836136k dl♡ 1836
- 🤗alvarobartt/UltraCM-13B-GGUFmodel· 207 dl· ♡ 4207 dl♡ 4
- 🤗yanolja/Bookworm-10.7B-v0.4-DPOmodel· 9 dl· ♡ 119 dl♡ 11
- 🤗yanolja/YanoljaNEXT-EEVE-Instruct-2.8Bmodel· 268 dl· ♡ 30268 dl♡ 30
- 🤗maywell/EEVE-Korean-Instruct-10.8B-v1.0-32kmodel· 17 dl· ♡ 1017 dl♡ 10
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
MethodsALIGN · Multi-Head Attention · Attention Is All You Need · Dropout · Dense Connections · Linear Layer · Label Smoothing · Adam · Absolute Position Encodings · Residual Connection
