POETS: Uncertainty-Aware LLM Optimization via Compute-Efficient Policy Ensembles

Nicolas Menet; Andreas Krause; Abbas Rahimi

arXiv:2605.07775·cs.LG·May 11, 2026

POETS: Uncertainty-Aware LLM Optimization via Compute-Efficient Policy Ensembles

Nicolas Menet, Andreas Krause, Abbas Rahimi

PDF

TL;DR

POETS introduces an efficient, uncertainty-aware policy ensemble framework for large language model optimization, achieving state-of-the-art results in scientific discovery and reinforcement learning tasks.

Contribution

It presents a novel compute-efficient ensemble architecture that directly captures epistemic uncertainty without complex reward modeling, with theoretical regret guarantees.

Findings

01

Achieves state-of-the-art sample efficiency in scientific discovery domains.

02

Improves reinforcement learning optimization trajectories, especially off-policy.

03

Uses shared backbone with independent LoRA branches for diversity.

Abstract

Balancing exploration and exploitation is a core challenge in sequential decision-making and black-box optimization. We introduce POETS ( $Po$ licy $E$ nsembles for $T$ hompson $S$ ampling), a novel framework that bridges uncertainty quantification and policy optimization. Our approach is grounded in the insight that policies trained with Kullback-Leibler (KL) regularization implicitly encode an underlying reward function. Building on this, POETS bypasses the complex, nested process of training an uncertainty-aware reward model and separately fitting a policy to this model. Instead, we directly train a policy ensemble to capture epistemic uncertainty by matching implicitly encoded reward functions to online, bootstrapped data. To overcome the prohibitive compute and memory constraints of ensembling Large Language Models (LLMs), POETS utilizes an efficient…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.