Think Inside the JSON: Reinforcement Strategy for Strict LLM Schema   Adherence

Bhavik Agarwal; Ishan Joshi; Viktoria Rojkova

arXiv:2502.14905·cs.CL·February 24, 2025

Think Inside the JSON: Reinforcement Strategy for Strict LLM Schema Adherence

Bhavik Agarwal, Ishan Joshi, Viktoria Rojkova

PDF

TL;DR

This paper presents a reinforcement learning approach to improve large language models' ability to strictly follow predefined schemas, using a resource-efficient pipeline that combines synthetic data and custom rewards.

Contribution

It introduces a novel reinforcement learning pipeline with synthetic reasoning data and custom reward functions to enhance schema adherence in LLMs, building on the DeepSeek R1 framework.

Findings

01

Model effectively enforces schema consistency in text generation.

02

Resource-efficient training requires only 20 hours on a GPU cluster.

03

Outperforms comparable models in real-world schema adherence tasks.

Abstract

In this paper, we address the challenge of enforcing strict schema adherence in large language model (LLM) generation by leveraging LLM reasoning capabilities. Building on the DeepSeek R1 reinforcement learning framework, our approach trains structured reasoning skills of a 1.5B parameter model through a novel pipeline that combines synthetic reasoning dataset construction with custom reward functions under Group Relative Policy Optimization (GRPO). Specifically, we first perform R1 reinforcement learning on a 20K sample unstructured-to-structured dataset, mirroring the original DeepSeek R1 methods, to establish core reasoning abilities. Subsequently, we performed supervised fine-tuning on a separate 10K reasoning sample dataset, focusing on refining schema adherence for downstream tasks. Despite the relatively modest training scope, requiring approximately 20 hours on an 8xH100 GPU…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.