Rubrics to Tokens: Bridging Response-level Rubrics and Token-level Rewards in Instruction Following Tasks

Tianze Xu; Yanzhao Zheng; Pengrui Lu; Lyumanshan Ye; Yong Wu; Zhentao Zhang; Yuanqiang Yu; Chao Ma; Jihuai Zhu; Pengfei Liu; Baohua Dong; Hangcheng Zhu; Ruohui Huang; Gang Yu

arXiv:2604.02795·cs.CL·April 6, 2026

Rubrics to Tokens: Bridging Response-level Rubrics and Token-level Rewards in Instruction Following Tasks

Tianze Xu, Yanzhao Zheng, Pengrui Lu, Lyumanshan Ye, Yong Wu, Zhentao Zhang, Yuanqiang Yu, Chao Ma, Jihuai Zhu, Pengfei Liu, Baohua Dong, Hangcheng Zhu, Ruohui Huang, Gang Yu

PDF

1 Repo

TL;DR

This paper introduces RTT, a new RL framework that connects response-level scores with token-level rewards using a discriminator and a specialized normalization, improving instruction-following in language models.

Contribution

The paper proposes RTT, a novel rubric-based RL method that bridges coarse response scores and fine-grained token rewards, with a new normalization technique for multi-dimensional rewards.

Findings

01

RTT outperforms baselines in instruction-level accuracy.

02

RTT improves rubric-level accuracy across models.

03

The normalization method effectively handles multi-dimensional rewards.

Abstract

Rubric-based Reinforcement Learning (RL) has emerged as a promising approach for aligning Large Language Models (LLMs) with complex, open-domain instruction following tasks. However, existing methods predominantly rely on response-level rewards, introducing severe reward sparsity and reward ambiguity problems. To address these issues, we propose Rubrics to Tokens (RTT), a novel rubric-based RL framework that bridges coarse response-level scores and fine-grained token-level credit assignment. RTT introduces a Token-Level Relevance Discriminator to predict which tokens in the response are responsible for a specific constraint, and optimizes the policy model via RTT-GRPO, which integrates response-level and token-level advantages within a unified framework. Furthermore, when transitioning from one-dimensional, outcome-level reward to three-dimensional reward space in the token-level…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

turleing/Rubrics-To-Tokens
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.