When Agents Go Quiet: Output Generation Capacity and Format-Cost Separation for LLM Document Synthesis

Justice Owusu Agyemang; Michael Agyare; Miriam Kobbinah; Nathaniel Agbugblah; Prosper Addo

arXiv:2604.16736·cs.AI·April 21, 2026

When Agents Go Quiet: Output Generation Capacity and Format-Cost Separation for LLM Document Synthesis

Justice Owusu Agyemang, Michael Agyare, Miriam Kobbinah, Nathaniel Agbugblah, Prosper Addo

PDF

1 Repo

TL;DR

This paper introduces a theoretical framework and practical methods to prevent output stalling in LLM agents during large document synthesis, significantly reducing token usage and improving reliability.

Contribution

The authors develop the Output Generation Capacity measure, prove a Format-Cost Separation Theorem, and formalize an adaptive strategy for optimal output generation, validated across multiple models and document types.

Findings

01

Deferred rendering reduces token usage by 48-72%.

02

Output stalling is eliminated with the proposed methods.

03

The framework is implemented as an open-source tool, GEN-PILOT.

Abstract

LLM-powered coding agents suffer from a poorly understood failure mode we term output stalling: the agent silently produces empty responses when attempting to generate large, format-heavy documents. We present a theoretical framework that explains and prevents this failure through three contributions. (1) We introduce Output Generation Capacity (OGC), a formal measure of an agent's effective ability to produce output given its current context state - distinct from and empirically smaller than the raw context window. (2) We prove a Format-Cost Separation Theorem showing that deferred template rendering is always at least as token-efficient as direct generation for any format with overhead multiplier $μ_{f} > 1$ , and derive tight bounds on the savings. (3) We formalize Adaptive Strategy Selection, a decision framework that maps the ratio of estimated output cost to available OGC into an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.