Thinking Before Constraining: A Unified Decoding Framework for Large Language Models

Ngoc Trinh Hung Nguyen; Alonso Silva; Laith Zumot; Liubov Tupikina; Armen Aghasaryan; Mehwish Alam

arXiv:2601.07525·cs.CL·January 13, 2026

Thinking Before Constraining: A Unified Decoding Framework for Large Language Models

Ngoc Trinh Hung Nguyen, Alonso Silva, Laith Zumot, Liubov Tupikina, Armen Aghasaryan, Mehwish Alam

PDF

Open Access

TL;DR

This paper introduces a hybrid decoding framework for large language models that combines free reasoning with structured output generation, improving accuracy and reliability in tasks requiring structured responses.

Contribution

It proposes a simple, effective method that switches from natural to structured decoding at trigger points, balancing expressive reasoning and output parsability.

Findings

01

Achieves up to 27% accuracy improvement over natural generation.

02

Requires only 10-20 extra tokens for structured outputs.

03

Effective across classification and reasoning datasets.

Abstract

Natural generation allows Language Models (LMs) to produce free-form responses with rich reasoning, but the lack of guaranteed structure makes outputs difficult to parse or verify. Structured generation, or constrained decoding, addresses this drawback by producing content in standardized formats such as JSON, ensuring consistency and guaranteed-parsable outputs, but it can inadvertently restrict the model's reasoning capabilities. In this work, we propose a simple approach that combines the advantages of both natural and structured generation. By allowing LLMs to reason freely until specific trigger tokens are generated, and then switching to structured generation, our method preserves the expressive power of natural language reasoning while ensuring the reliability of structured outputs. We further evaluate our approach on several datasets, covering both classification and reasoning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques