Semantic-Geometric Dual Compression: Training-Free Visual Token Reduction for Ultra-High-Resolution Remote Sensing Understanding

Yueying Li; Fengxiang Wang; Yan Li; Mingshuo Chen; Mengying Zhao; Long Lan

arXiv:2604.11122·cs.CV·April 14, 2026

Semantic-Geometric Dual Compression: Training-Free Visual Token Reduction for Ultra-High-Resolution Remote Sensing Understanding

Yueying Li, Fengxiang Wang, Yan Li, Mingshuo Chen, Mengying Zhao, Long Lan

PDF

TL;DR

This paper introduces DualComp, a task-adaptive token compression framework for ultra-high-resolution remote sensing, significantly reducing computational costs while maintaining high interpretation accuracy.

Contribution

DualComp uniquely employs a dual-stream, task-specific approach guided by a lightweight router to optimize token compression for different remote sensing tasks.

Findings

01

Achieves high-fidelity interpretation with low computational cost.

02

Improves efficiency and accuracy on the XLRS-Bench benchmark.

03

Effectively preserves small objects and spatial topology.

Abstract

Multimodal Large Language Models (MLLMs) have demonstrated immense potential in Earth observation. However, the massive visual tokens generated when processing Ultra-High-Resolution (UHR) imagery introduce prohibitive computational overhead, severely bottlenecking their inference efficiency. Existing visual token compression methods predominantly adopt static and uniform compression strategies, neglecting the inherent "Semantic-Geometric Duality" in remote sensing interpretation tasks. Specifically, object semantic tasks focus on the abstract semantics of objects and benefit from aggressive background pruning, whereas scene geometric tasks critically rely on the integrity of spatial topology. To address this challenge, we propose DualComp, a task-adaptive dual-stream token compression framework. Dynamically guided by a lightweight pre-trained router, DualComp decouples feature…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.