JSONSchemaBench: A Rigorous Benchmark of Structured Outputs for Language   Models

Saibo Geng; Hudson Cooper; Micha{\l} Moskal; Samuel Jenkins; Julian; Berman; Nathan Ranchin; Robert West; Eric Horvitz; Harsha Nori

arXiv:2501.10868·cs.CL·February 28, 2025·2 cites

JSONSchemaBench: A Rigorous Benchmark of Structured Outputs for Language Models

Saibo Geng, Hudson Cooper, Micha{\l} Moskal, Samuel Jenkins, Julian, Berman, Nathan Ranchin, Robert West, Eric Horvitz, Harsha Nori

PDF

Open Access 2 Repos 1 Datasets

TL;DR

This paper introduces JSONSchemaBench, a comprehensive benchmark for evaluating constrained decoding methods in language models, focusing on efficiency, coverage, and quality across real-world JSON schemas.

Contribution

It presents a systematic evaluation framework and a large-scale benchmark for constrained decoding, providing insights into their performance and limitations in structured generation tasks.

Findings

01

Guidance and Outlines frameworks show high constraint compliance.

02

Performance varies significantly across different schema complexities.

03

The benchmark reveals key limitations in current constrained decoding methods.

Abstract

Reliably generating structured outputs has become a critical capability for modern language model (LM) applications. Constrained decoding has emerged as the dominant technology across sectors for enforcing structured outputs during generation. Despite its growing adoption, little has been done with the systematic evaluation of the behaviors and performance of constrained decoding. Constrained decoding frameworks have standardized around JSON Schema as a structured data format, with most uses guaranteeing constraint compliance given a schema. However, there is poor understanding of the effectiveness of the methods in practice. We present an evaluation framework to assess constrained decoding approaches across three critical dimensions: efficiency in generating constraint-compliant outputs, coverage of diverse constraint types, and quality of the generated outputs. To facilitate this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

epfl-dlab/JSONSchemaBench
dataset· 3.5k dl
3.5k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques