Language-Guided Token Compression with Reinforcement Learning in Large Vision-Language Models

Sihan Cao; Jianwei Zhang; Pengcheng Zheng; Jiaxin Yan; Caiyan Qin; Yalan Ye; Wei Dong; Peng Wang; Yang Yang; Chaoning Zhang

arXiv:2603.13394·cs.CV·March 17, 2026

Language-Guided Token Compression with Reinforcement Learning in Large Vision-Language Models

Sihan Cao, Jianwei Zhang, Pengcheng Zheng, Jiaxin Yan, Caiyan Qin, Yalan Ye, Wei Dong, Peng Wang, Yang Yang, Chaoning Zhang

PDF

Open Access

TL;DR

This paper introduces TPRL, a reinforcement learning framework that adaptively prunes visual tokens in large vision-language models, significantly reducing inference costs with minimal accuracy loss.

Contribution

The paper presents a novel RL-based method for sequential visual token pruning guided by language, optimizing both accuracy and efficiency in large vision-language models.

Findings

01

up to 66.7% token removal

02

54.2% FLOPs reduction

03

0.7% accuracy drop

Abstract

Large Vision-Language Models (LVLMs) incur substantial inference costs due to the processing of a vast number of visual tokens. Existing methods typically struggle to model progressive visual token reduction as a multi-step decision process with sequential dependencies and often rely on hand-engineered scoring rules that lack adaptive optimization for complex reasoning trajectories. To overcome these limitations, we propose TPRL, a reinforcement learning framework that learns adaptive pruning trajectories through language-guided sequential optimization tied directly to end-task performance. We formulate visual token pruning as a sequential decision process with explicit state transitions and employ a self-supervised autoencoder to compress visual tokens into a compact state representation for efficient policy learning. The pruning policy is initialized through learning from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications