Learning to Reason in LLMs by Expectation Maximization

Junghyun Lee; Branislav Kveton; Anup Rao; Subhojyoti Mukherjee; Ryan A. Rossi; Sunav Choudhary; Alexa Siu

arXiv:2512.20169·cs.LG·February 3, 2026

Learning to Reason in LLMs by Expectation Maximization

Junghyun Lee, Branislav Kveton, Anup Rao, Subhojyoti Mukherjee, Ryan A. Rossi, Sunav Choudhary, Alexa Siu

PDF

Open Access

TL;DR

This paper introduces a formal framework for reasoning in large language models using an expectation-maximization approach, proposing new sampling methods to improve rationale generation and answer accuracy.

Contribution

It formalizes reasoning as a latent variable model and develops a reward-based FEM objective, comparing three sampling schemes for better rationale and answer generation in LLMs.

Findings

01

PPS outperforms other sampling schemes in experiments.

02

Conditioning on correct answers improves rationale quality.

03

Sampling scheme choice significantly affects model performance.

Abstract

Large language models (LLMs) solve reasoning problems by first generating a rationale and then answering. We formalize reasoning as a latent variable model and derive a reward-based filtered expectation-maximization (FEM) objective for learning to reason. This view connects EM and modern reward-based optimization, and shows that the main challenge lies in designing a sampling distribution of rationales that justify correct answers. We instantiate and compare three sampling schemes: rejection sampling with a budget, self-taught reasoner (STaR), and prompt posterior sampling (PPS), which only keeps the rationalization stage of STaR that conditions on the correct answer in the prompt. We experiment with LLM-as-a-judge calibration and summarization from feedback tasks, where conditioning on the correct answer provides a strong guidance for generating rationales. Our experiments show the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications