RubricHub: A Comprehensive and Highly Discriminative Rubric Dataset via Automated Coarse-to-Fine Generation

Sunzhu Li; Jiale Zhao; Miteto Wei; Huimin Ren; Yang Zhou; Jingwen Yang; Shunyu Liu; Kaike Zhang; Wei Chen

arXiv:2601.08430·cs.AI·January 29, 2026

RubricHub: A Comprehensive and Highly Discriminative Rubric Dataset via Automated Coarse-to-Fine Generation

Sunzhu Li, Jiale Zhao, Miteto Wei, Huimin Ren, Yang Zhou, Jingwen Yang, Shunyu Liu, Kaike Zhang, Wei Chen

PDF

Open Access 5 Datasets

TL;DR

RubricHub introduces a large, multi-domain rubric dataset generated through an automated coarse-to-fine process, significantly improving the performance of reinforcement learning models in reasoning tasks.

Contribution

The paper presents a novel automated rubric generation framework and a large-scale dataset, enabling enhanced supervision for reasoning-intensive open-ended generation tasks.

Findings

01

RubricHub dataset contains approximately 110,000 examples across multiple domains.

02

Post-training with RubricHub improves model performance, achieving SOTA results on HealthBench.

03

The approach surpasses proprietary models like GPT-5 in specific reasoning benchmarks.

Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) has driven substantial progress in reasoning-intensive domains like mathematics. However, optimizing open-ended generation remains challenging due to the lack of ground truth. While rubric-based evaluation offers a structured proxy for verification, existing methods suffer from scalability bottlenecks and coarse criteria, resulting in a supervision ceiling effect. To address this, we propose an automated Coarse-to-Fine Rubric Generation framework. By synergizing principle-guided synthesis, multi-model aggregation, and difficulty evolution, our approach produces comprehensive and highly discriminative criteria capable of capturing the subtle nuances. Based on this framework, we introduce RubricHub, a large-scale ( $\sim$ 110k) and multi-domain dataset. We validate its utility through a two-stage post-training pipeline comprising…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Machine Learning and Data Classification