JSONSchemaBench: A Rigorous Benchmark of Structured Outputs for Language Models
Saibo Geng, Hudson Cooper, Micha{\l} Moskal, Samuel Jenkins, Julian, Berman, Nathan Ranchin, Robert West, Eric Horvitz, Harsha Nori

TL;DR
This paper introduces JSONSchemaBench, a comprehensive benchmark for evaluating constrained decoding methods in language models, focusing on efficiency, coverage, and quality across real-world JSON schemas.
Contribution
It presents a systematic evaluation framework and a large-scale benchmark for constrained decoding, providing insights into their performance and limitations in structured generation tasks.
Findings
Guidance and Outlines frameworks show high constraint compliance.
Performance varies significantly across different schema complexities.
The benchmark reveals key limitations in current constrained decoding methods.
Abstract
Reliably generating structured outputs has become a critical capability for modern language model (LM) applications. Constrained decoding has emerged as the dominant technology across sectors for enforcing structured outputs during generation. Despite its growing adoption, little has been done with the systematic evaluation of the behaviors and performance of constrained decoding. Constrained decoding frameworks have standardized around JSON Schema as a structured data format, with most uses guaranteeing constraint compliance given a schema. However, there is poor understanding of the effectiveness of the methods in practice. We present an evaluation framework to assess constrained decoding approaches across three critical dimensions: efficiency in generating constraint-compliant outputs, coverage of diverse constraint types, and quality of the generated outputs. To facilitate this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
