OpenGenAlign: A Preference Dataset and Benchmark for Trustworthy Reward Modeling in Open-Ended, Long-Context Generation

Hanning Zhang; Juntong Song; Juno Zhu; Yuanhao Wu; Tong Zhang; Cheng Niu

arXiv:2501.13264·cs.CL·November 13, 2025·2 cites

OpenGenAlign: A Preference Dataset and Benchmark for Trustworthy Reward Modeling in Open-Ended, Long-Context Generation

Hanning Zhang, Juntong Song, Juno Zhu, Yuanhao Wu, Tong Zhang, Cheng Niu

PDF

Open Access

TL;DR

OpenGenAlign introduces a new dataset and benchmark for reward modeling aimed at enhancing trustworthy, long-context open-ended generation in large language models, addressing hallucination and reliability issues.

Contribution

The paper presents a novel dataset and framework for reward modeling tailored to long-context generation, with improved evaluation metrics and reinforcement learning-based enhancement methods.

Findings

01

Existing reward models perform poorly on the new benchmark.

02

The trained reward model outperforms existing models in evaluation.

03

OpenGenAlign improves generation quality and can be integrated with other domain data.

Abstract

Reward Modeling is critical in evaluating and improving the generation of Large Language Models (LLMs). While numerous recent works have shown its feasibility in improving safety, helpfulness, reasoning, and instruction-following ability, its capability and generalization to open-ended long-context generation is still rarely explored. In this paper, we introduce OpenGenAlign, a framework and a high-quality dataset designed to develop reward models to evaluate and improve hallucination-free, comprehensive, reliable, and efficient open-ended long-context generation. We define four key metrics to assess generation quality and develop an automated pipeline to evaluate the outputs of multiple LLMs across long-context QA, Data-to-Text, and Summarization scenarios using o3, ending up with 33K high-quality preference data with a human agreement rate of 81\%. Experimental results first…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHeart Rate Variability and Autonomic Control

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Layer Normalization · Dense Connections · Adam · Softmax · Linear Warmup With Linear Decay · Residual Connection · Dropout · Byte Pair Encoding