OLLM: Options-based Large Language Models
Shashank Sharma, Janina Hoffmann, Vinay Namboodiri

TL;DR
OLLM introduces a set of learned options for next-token prediction in LLMs, improving diversity, controllability, and efficiency in reasoning tasks by explicitly modeling plausible token options.
Contribution
It presents a lightweight method to replace standard next-token prediction with learned options, enhancing LLMs' robustness and controllability without extensive retraining.
Findings
OLLM achieves up to 70% correctness on math reasoning tasks.
Optionized modeling improves sample efficiency in reward optimization.
The method reduces common misalignments compared to baseline models.
Abstract
We introduce Options LLM (OLLM), a simple, general method that replaces the single next-token prediction of standard LLMs with a \textit{set of learned options} for the next token, indexed by a discrete latent variable. Instead of relying on temperature or sampling heuristics to induce diversity, OLLM models variation explicitly: a small latent space parametrizes multiple plausible next-token options which can be selected or searched by a downstream policy. Architecturally, OLLM is a lightweight "plug-in" that inserts two layers: an encoder and a decoder, before the output head, allowing almost any pretrained LLM to be converted with minimal additional parameters. We apply OLLM to a 1.7B-parameter backbone (only of parameters trainable) trained on OpenMathReasoning and evaluated on OmniMath. The SOTA LoRA-adapted baselines peak at final answer correctness, while OLLM's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
