Compress to Focus: Efficient Coordinate Compression for Policy Optimization in Multi-Turn GUI Agents

Yurun Song; Jiong Yin; Rongjunchen Zhang; Ian G. Harris

arXiv:2601.11631·cs.CV·January 21, 2026

Compress to Focus: Efficient Coordinate Compression for Policy Optimization in Multi-Turn GUI Agents

Yurun Song, Jiong Yin, Rongjunchen Zhang, Ian G. Harris

PDF

Open Access

TL;DR

This paper introduces CCPO, a novel framework that combines visual compression with policy optimization to improve multi-turn GUI agents by focusing on relevant scene regions, reducing context size, and accelerating training.

Contribution

The paper proposes Coordinate Compression Policy Optimization (CCPO) with CASC and Distance-Based Advantage, enabling efficient, focused decision-making in multi-turn GUI agents with state-of-the-art results.

Findings

01

Achieves up to 55% token compression

02

Provides 3.8× training speedup

03

Outperforms existing methods on four benchmarks

Abstract

Multi-turn GUI agents enable complex task completion through sequential decision-making, but suffer from severe context inflation as interaction history accumulates. Existing strategies either sacrifice long-term context via truncation or compromise spatial structure through token pruning. In this paper, we propose Coordinate Compression Policy Optimization (CCPO), an efficient policy optimization framework that couples visual compression with policy optimization for multi-turn GUI agents. CCPO introduces Coordinate-Aware Spatial Compression (CASC), which aggregates coordinates from multiple rollouts to capture target-relevant regions and progressively narrow historical attention around key visual areas. From interactions across rollouts, CASC adaptively constructs attention boundaries that concentrate computation on the most informative regions of the scene. We further design a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Reinforcement Learning in Robotics