TL;DR
Structured output requests in large language models significantly impair reasoning and writing performance, but decoupling reasoning from formatting can recover most of the lost accuracy.
Contribution
The paper identifies that prompt formatting instructions cause most accuracy loss and proposes decoupling reasoning from formatting to improve performance in open-weight models.
Findings
Decoupling reasoning from formatting recovers most accuracy loss.
Format requests cause significant degradation in open-weight models.
Closed-weight models show little to no format tax.
Abstract
Asking a large language model to respond in JSON should be a formatting choice, not a capability tax. Yet we find that structured output requirements -- JSON, XML, LaTeX, Markdown -- substantially degrade reasoning and writing performance across open-weight models. The research response has focused on constrained decoding, but sampling bias accounts for only a fraction of the degradation. The dominant cost enters at the prompt: format-requesting instructions alone cause most of the accuracy loss, before any decoder constraint is applied. This diagnosis points to a simple principle: decouple reasoning from formatting. Whether by generating freeform first and reformatting in a second pass, or by enabling extended thinking within a single generation, separating the two concerns substantially recovers lost accuracy. Across six open-weight models, four API models, four formats, and tasks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
