Reinforcement Learning from Meta-Evaluation: Aligning Language Models Without Ground-Truth Labels

Micah Rentschler; Jesse Roberts

arXiv:2601.21268·cs.NE·January 30, 2026

Reinforcement Learning from Meta-Evaluation: Aligning Language Models Without Ground-Truth Labels

Micah Rentschler, Jesse Roberts

PDF

Open Access

TL;DR

This paper presents RLME, a novel reinforcement learning approach that trains large language models using evaluator responses to natural-language questions, eliminating the need for ground-truth labels and improving training flexibility.

Contribution

RLME introduces a label-free reinforcement learning method for LLMs that uses meta-evaluation rewards, enabling scalable and controllable training without explicit ground-truth labels.

Findings

01

Achieves accuracy comparable to label-based methods

02

Enables training without ground-truth labels in open domains

03

Allows controllable trade-offs among multiple objectives

Abstract

Most reinforcement learning (RL) methods for training large language models (LLMs) require ground-truth labels or task-specific verifiers, limiting scalability when correctness is ambiguous or expensive to obtain. We introduce Reinforcement Learning from Meta-Evaluation (RLME), which optimizes a generator using reward derived from an evaluator's answers to natural-language meta-questions (e.g., "Is the answer correct?" or "Is the reasoning logically consistent?"). RLME treats the evaluator's probability of a positive judgment as a reward and updates the generator via group-relative policy optimization, enabling learning without labels. Across a suite of experiments, we show that RLME achieves accuracy and sample efficiency comparable to label-based training, enables controllable trade-offs among multiple objectives, steers models toward reliable reasoning patterns rather than post-hoc…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)