PoC: Performance-oriented Context Compression for Large Language Models via Performance Prediction

Runsong Zhao; Shilei Liu; Jiwei Tang; Langming Liu; Haibin Chen; Weidong Zhang; Yujin Yuan; Tong Xiao; Jingbo Zhu; Wenbo Su; Bo Zheng

arXiv:2603.19733·cs.CL·March 23, 2026

PoC: Performance-oriented Context Compression for Large Language Models via Performance Prediction

Runsong Zhao, Shilei Liu, Jiwei Tang, Langming Liu, Haibin Chen, Weidong Zhang, Yujin Yuan, Tong Xiao, Jingbo Zhu, Wenbo Su, Bo Zheng

PDF

Open Access

TL;DR

This paper introduces Performance-oriented Context Compression (PoC) for LLMs, which optimizes context reduction based on a specified performance threshold, improving reliability and efficiency over traditional ratio-based methods.

Contribution

The paper proposes a novel performance-aware compression framework with a lightweight predictor, including context-aware variants, to better balance compression and performance in LLM deployment.

Findings

01

Context-aware predictor reduces prediction error.

02

PoC achieves better overall performance.

03

Improves reliability of context compression.

Abstract

While context compression can mitigate the growing inference costs of Large Language Models (LLMs) by shortening contexts, existing methods that specify a target compression ratio or length suffer from unpredictable performance degradation, hindering their reliable deployment. We introduce a paradigm shift to Performance-oriented Context Compression (PoC), where developers specify an acceptable performance floor instead of a compression ratio. PoC employs a lightweight performance predictor to automatically find the most aggressive compression ratio that satisfies this constraint before steering an off-the-shelf compressor. We design and compare two predictor variants: a simple context-agnostic predictor and a more sophisticated context-aware one that considers the input's inherent compressibility. On both question-answering and summarization benchmarks, the context-aware predictor…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware System Performance and Reliability · Natural Language Processing Techniques · Big Data and Digital Economy