Squeez: Task-Conditioned Tool-Output Pruning for Coding Agents

\'Ad\'am Kov\'acs

arXiv:2604.04979·cs.SE·April 8, 2026

Squeez: Task-Conditioned Tool-Output Pruning for Coding Agents

\'Ad\'am Kov\'acs

PDF

1 Repo

TL;DR

This paper introduces a task-conditioned tool-output pruning method for coding agents, significantly reducing input size while maintaining high accuracy in evidence retrieval.

Contribution

It presents a new benchmark dataset and fine-tunes a model to outperform larger models and heuristics in evidence pruning for coding tasks.

Findings

01

Model achieves 0.86 recall and 0.80 F1.

02

Removes 92% of input tokens while maintaining performance.

03

Outperforms larger zero-shot models and heuristic baselines.

Abstract

Coding agents repeatedly consume long tool observations even though only a small fraction of each observation matters for the next step. We study task-conditioned tool-output pruning: given a focused query and one tool output, return the smallest verbatim evidence block the agent should inspect next. We introduce a benchmark of 11,477 examples built from SWE-bench repository interactions and synthetic multi-ecosystem tool outputs, with a manually curated 618-example test set. We fine-tune Qwen 3.5 2B with LoRA and compare it against larger zero-shot models and heuristic pruning baselines. Our model reaches 0.86 recall and 0.80 F1 while removing 92% of input tokens, outperforming zero-shot Qwen 3.5 35B A3B by 11 recall points and all heuristic baselines by a wide margin.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

krlabsorg/squeez
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.