RL-Struct: A Lightweight Reinforcement Learning Framework for Reliable Structured Output in LLMs
Ruike Hu, Shulei Wu

TL;DR
RL-Struct is a lightweight reinforcement learning framework that improves the structural accuracy of LLM outputs by aligning them with deterministic schemas, reducing resource usage and outperforming baselines.
Contribution
It introduces RL-Struct, a novel RL framework with a hierarchical reward function that enhances LLM structural compliance without needing a critic network.
Findings
Achieves 89.7% structural accuracy on JSON tasks
Reduces peak VRAM by 38% compared to PPO
Outperforms SFT and zero-shot baselines
Abstract
The Structure Gap between probabilistic LLM generation and deterministic schema requirements hinders automated workflows. We propose RL-Struct, a lightweight framework using Gradient Regularized Policy Optimization (GRPO) with a hierarchical reward function to align LLMs with structural constraints. This approach eliminates the critic network, reducing peak VRAM by 38% compared to PPO. On complex JSON tasks, RL-Struct achieves 89.7% structural accuracy and 92.1% validity, significantly outperforming SFT and zero-shot baselines. We also report an emergent curriculum--a self-organized learning process where the model prioritizes syntax before semantics. Our model is publicly available at https://huggingface.co/Freakz3z/Qwen-JSON.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Graph Neural Networks · Software Engineering Research
