OLLM: Options-based Large Language Models

Shashank Sharma; Janina Hoffmann; Vinay Namboodiri

arXiv:2604.19087·cs.AI·April 22, 2026

OLLM: Options-based Large Language Models

Shashank Sharma, Janina Hoffmann, Vinay Namboodiri

PDF

TL;DR

OLLM introduces a set of learned options for next-token prediction in LLMs, improving diversity, controllability, and efficiency in reasoning tasks by explicitly modeling plausible token options.

Contribution

It presents a lightweight method to replace standard next-token prediction with learned options, enhancing LLMs' robustness and controllability without extensive retraining.

Findings

01

OLLM achieves up to 70% correctness on math reasoning tasks.

02

Optionized modeling improves sample efficiency in reward optimization.

03

The method reduces common misalignments compared to baseline models.

Abstract

We introduce Options LLM (OLLM), a simple, general method that replaces the single next-token prediction of standard LLMs with a \textit{set of learned options} for the next token, indexed by a discrete latent variable. Instead of relying on temperature or sampling heuristics to induce diversity, OLLM models variation explicitly: a small latent space parametrizes multiple plausible next-token options which can be selected or searched by a downstream policy. Architecturally, OLLM is a lightweight "plug-in" that inserts two layers: an encoder and a decoder, before the output head, allowing almost any pretrained LLM to be converted with minimal additional parameters. We apply OLLM to a 1.7B-parameter backbone (only $1.56%$ of parameters trainable) trained on OpenMathReasoning and evaluated on OmniMath. The SOTA LoRA-adapted baselines peak at $51%$ final answer correctness, while OLLM's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.