Prompt Compression in Production Task Orchestration: A Pre-Registered Randomized Trial

Warren Johnson; Charles Lee

arXiv:2603.23525·cs.CL·March 26, 2026

Prompt Compression in Production Task Orchestration: A Pre-Registered Randomized Trial

Warren Johnson, Charles Lee

PDF

Open Access

TL;DR

This study evaluates prompt compression strategies in production multi-agent task orchestration, revealing that moderate compression reduces costs effectively, while aggressive compression can increase costs due to output expansion, emphasizing careful policy design.

Contribution

The paper provides the first empirical analysis of prompt compression effects on production costs and output quality in multi-agent orchestration, introducing structure-aware strategies and cost-similarity trade-offs.

Findings

01

Moderate compression (r=0.5) reduces total inference cost by 27.9%.

02

Aggressive compression (r=0.2) increases total cost by 1.8%.

03

Recency-weighted compression achieves 23.5% savings and balances cost and similarity.

Abstract

The economics of prompt compression depend not only on reducing input tokens but on how compression changes output length, which is typically priced several times higher. We evaluate this in a pre-registered six-arm randomized controlled trial of prompt compression on production multi-agent task-orchestration, analyzing 358 successful Claude Sonnet 4.5 runs (59-61 per arm) drawn from a randomized corpus of 1,199 real orchestration instructions. We compare an uncompressed control with three uniform retention rates (r=0.8, 0.5, 0.2) and two structure-aware strategies (entropy-adaptive and recency-weighted), measuring total inference cost (input+output) and embedding-based response similarity. Moderate compression (r=0.5) reduced mean total cost by 27.9%, while aggressive compression (r=0.2) increased mean cost by 1.8% despite substantial input reduction, consistent with small mean output…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMobile Crowdsensing and Crowdsourcing · Embodied and Extended Cognition · Logic, programming, and type systems