HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages
Zhilin Wang, Jiaqi Zeng, Olivier Delalleau, Hoo-Chang Shin, Felipe Soares, Alexander Bukharin, Ellie Evans, Yi Dong, Oleksii Kuchaiev

TL;DR
HelpSteer3-Preference is a large, open, human-annotated dataset across multiple tasks and languages, significantly improving reward model performance for training instruction-following language models.
Contribution
We introduce a high-quality, diverse, open preference dataset that enhances reward model training and aligns policy models with RLHF across various domains.
Findings
Reward models trained on HelpSteer3-Preference outperform previous models by ~10%.
The dataset covers STEM, coding, and multilingual tasks.
Models trained with this data achieve top benchmark scores.
Abstract
Preference datasets are essential for training general-domain, instruction-following language models with Reinforcement Learning from Human Feedback (RLHF). Each subsequent data release raises expectations for future data collection, meaning there is a constant need to advance the quality and diversity of openly available preference data. To address this need, we introduce HelpSteer3-Preference, a permissively licensed (CC-BY-4.0), high-quality, human-annotated preference dataset comprising of over 40,000 samples. These samples span diverse real-world applications of large language models (LLMs), including tasks relating to STEM, coding and multilingual scenarios. Using HelpSteer3-Preference, we train Reward Models (RMs) that achieve top performance on RM-Bench (82.4%) and JudgeBench (73.7%). This represents a substantial improvement (~10% absolute) over the previously best-reported…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗nvidia/Qwen3-Nemotron-235B-A22B-GenRM-2603model· 1.3k dl· ♡ 221.3k dl♡ 22
- 🤗nvidia/Qwen3-Nemotron-235B-A22B-GenRMmodel· 15k dl· ♡ 2915k dl♡ 29
- 🤗nvidia/Llama-3_3-Nemotron-Super-49B-GenRMmodel· 122 dl· ♡ 18122 dl♡ 18
- 🤗nvidia/Llama-3_3-Nemotron-Super-49B-GenRM-Multilingualmodel· 49 dl· ♡ 649 dl♡ 6
- 🤗nvidia/Llama-3.3-Nemotron-70B-Rewardmodel· 59 dl· ♡ 359 dl♡ 3
- 🤗nvidia/Llama-3.3-Nemotron-70B-Reward-Multilingualmodel· 38 dl· ♡ 1038 dl♡ 10
- 🤗nvidia/Qwen-2.5-Nemotron-32B-Rewardmodel· 18 dl· ♡ 218 dl♡ 2
- 🤗nvidia/Qwen-3-Nemotron-32B-Rewardmodel· 198 dl· ♡ 19198 dl♡ 19
- 🤗Bifrost-AI/Qwen-3-Nemotron-32B-Reward-F16model· 2 dl2 dl
- 🤗nvidia/Llama-3.3-Nemotron-70B-Reward-Principlemodel· 278 dl· ♡ 6278 dl♡ 6
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Topic Modeling
