ConPress: Learning Efficient Reasoning from Multi-Question Contextual Pressure
Jie Deng, Shining Liang, Jun Li, Hongzhi Li, Yutao Xie

TL;DR
ConPress leverages a phenomenon called Self-Compression, where multi-question prompts lead models to produce shorter reasoning traces, enabling efficient fine-tuning that reduces inference costs without sacrificing accuracy.
Contribution
The paper introduces ConPress, a novel self-supervised fine-tuning method that internalizes self-compression behavior from multi-question prompts to improve reasoning efficiency.
Findings
Reduces reasoning token usage by 59% on MATH500
Achieves 33% reduction on AIME25
Maintains competitive accuracy with fewer tokens
Abstract
Large reasoning models (LRMs) typically solve reasoning-intensive tasks by generating long chain-of-thought (CoT) traces, leading to substantial inference overhead. We identify a reproducible inference-time phenomenon, termed Self-Compression: when multiple independent and answerable questions are presented within a single prompt, the model spontaneously produces shorter reasoning traces for each question. This phenomenon arises from multi-question contextual pressure during generation and consistently manifests across models and benchmarks. Building on this observation, we propose ConPress (Learning from Contextual Pressure), a lightweight self-supervised fine-tuning approach. ConPress constructs multi-question prompts to induce self-compression, samples the resulting model outputs, and parses and filters per-question traces to obtain concise yet correct reasoning trajectories. These…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Intelligent Tutoring Systems and Adaptive Learning
