Latent Principle Discovery for Language Model Self-Improvement

Keshav Ramji; Tahira Naseem; Ram\'on Fernandez Astudillo

arXiv:2505.16927·cs.CL·November 17, 2025

Latent Principle Discovery for Language Model Self-Improvement

Keshav Ramji, Tahira Naseem, Ram\'on Fernandez Astudillo

PDF

Open Access 1 Video

TL;DR

This paper introduces a method for language models to self-improve by automatically discovering and applying latent behavioral principles through a self-correction framework, leading to enhanced response quality.

Contribution

It presents a novel approach that mines, compresses, and teaches latent principles within language models to enable automated self-improvement without extensive manual annotation.

Findings

01

Achieved +8-10% win-rate improvement on AlpacaEval

02

Improved MT-Bench scores by +0.3 on average

03

Increased principle-following win-rate by +19-23% on IFEval

Abstract

When language model (LM) users aim to improve the quality of its generations, it is crucial to specify concrete behavioral attributes that the model should strive to reflect. However, curating such principles across many domains, even non-exhaustively, requires a labor-intensive annotation process. To automate this process, we propose eliciting these latent attributes that guide model reasoning toward human-preferred responses by explicitly modeling them in a self-correction setting. Our approach mines new principles from the LM itself and compresses the discovered elements to an interpretable set via clustering. Specifically, we employ a form of posterior-regularized Monte Carlo Expectation-Maximization to both identify a condensed set of the most effective latent principles and teach the LM to strategically invoke them in order to intrinsically refine its responses. We demonstrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Latent Principle Discovery for Language Model Self-Improvement· slideslive

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications

MethodsSparse Evolutionary Training