EvoLM: Self-Evolving Language Models through Co-Evolved Discriminative Rubrics

Shuyue Stella Li; Rui Xin; Teng Xiao; Yike Wang; Rulin Shao; Zoey Hao; Melanie Sclar; Sewoong Oh; Faeze Brahman; Pang Wei Koh; Yulia Tsvetkov

arXiv:2605.03871·cs.AI·May 6, 2026

EvoLM: Self-Evolving Language Models through Co-Evolved Discriminative Rubrics

Shuyue Stella Li, Rui Xin, Teng Xiao, Yike Wang, Rulin Shao, Zoey Hao, Melanie Sclar, Sewoong Oh, Faeze Brahman, Pang Wei Koh, Yulia Tsvetkov

PDF

TL;DR

EVOLM introduces a self-evolving training method for language models that uses internally generated discriminative rubrics as reward signals, eliminating the need for external supervision.

Contribution

It presents a novel approach where a language model co-trains a rubric generator and a policy, enabling self-improvement solely from its own evaluative capacity.

Findings

01

EVOLM-trained Qwen3-8B outperforms GPT-4.1 on RewardBench-2 by 25.7%.

02

The policy achieves 69.3% on OLMo3-Adapt, surpassing models trained with external rubrics.

03

Self-supervised rubrics enable significant performance gains without human annotations.

Abstract

Language models encode substantial evaluative knowledge from pretraining, yet current post-training methods rely on external supervision (human annotations, proprietary models, or scalar reward models) to produce reward signals. Each imposes a ceiling. Human judgment cannot supervise capabilities beyond its own, proprietary APIs create dependencies, and verifiable rewards cover only domains with ground-truth answers. Self-improvement from a model's own evaluative capacity is a reward source that scales with the model itself, yet remains largely untapped by current methods. We introduce EVOLM, a post-training method that structures this capacity into explicit discriminative rubrics and uses them as training signal. EVOLM trains two capabilities within a single language model in alternation: (1) a rubric generator producing instance-specific evaluation criteria optimized for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.