Token-Oriented Object Notation vs JSON: A Benchmark of Plain and Constrained Decoding Generation
Ivan Matveev

TL;DR
This paper benchmarks Token-Oriented Object Notation (TOON) against JSON for structured data serialization in large language models, evaluating token efficiency and accuracy across various generation methods and complexities.
Contribution
It provides the first comprehensive comparison of TOON and JSON generation, highlighting their relative efficiencies and accuracy trade-offs in in-context learning scenarios.
Findings
TOON offers a promising accuracy/token ratio for in-domain tasks.
Plain JSON achieves the highest accuracy in one-shot generation.
Constrained decoding reduces token usage but can decrease accuracy.
Abstract
Recently presented Token-Oriented Object Notation (TOON) aims to replace JSON as a serialization format for passing structured data to LLMs with significantly reduced token usage. While showing solid accuracy in LLM comprehension, there is a lack of tests against JSON generation. Though never present in training data, TOON syntax is simple enough to suggest one-shot in-context learning could support accurate generation. The inevitable prompt overhead can be an acceptable trade-off for shorter completions. To test this, we conducted a benchmark creating several test cases with regard to structural complexity, a validation pipeline, and comparing plain JSON generation vs structured output (via constrained decoding) JSON generation vs TOON one-shot in-context learning generation. JSON structured output was included to establish a minimum token budget baseline and to set a starting point…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Adversarial Robustness in Machine Learning · Machine Learning and Algorithms
