ATLAS-RTC: Closing the Loop on LLM Agent Output with Token-Level Runtime Control
Christopher Cruz

TL;DR
ATLAS-RTC is a runtime control system for autoregressive language models that enforces structured output during decoding, improving success rates and reducing latency by monitoring and intervening at each step.
Contribution
It introduces a closed-loop runtime control mechanism that detects and corrects decoding drift in real-time, a novel approach compared to static or post-hoc methods.
Findings
Increases first-attempt success rates by up to 37.8 percentage points.
Reduces latency by up to 88% in failure-prone scenarios.
Many failures are due to decoding artifacts, not task misunderstanding.
Abstract
We present ATLAS-RTC, a runtime control system for autoregressive language models that enforces structured output during decoding. ATLAS-RTC monitors generation at each step, detects drift from output contracts using lightweight signals, and applies targeted interventions such as biasing, masking, and rollback. Unlike post-hoc validation or static constrained decoding, it operates in a closed loop, enabling correction before errors materialize. Across structured generation and tool-calling tasks, ATLAS-RTC improves first-attempt success rates by 20 to 37.8 percentage points, with up to 88% latency reduction in failure-dominated settings. Results show that many failures arise from decoding artifacts rather than task misunderstanding, motivating runtime control as a distinct layer in LLM systems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
