CORE: Compact Object-centric REpresentations as a New Paradigm for Token Merging in LVLMs

Jingyu Lei; Gaoang Wang; Der-Horng Lee

arXiv:2511.14072·cs.CV·November 19, 2025

CORE: Compact Object-centric REpresentations as a New Paradigm for Token Merging in LVLMs

Jingyu Lei, Gaoang Wang, Der-Horng Lee

PDF

Open Access

TL;DR

CORE introduces object-centric visual token compression using semantic segmentation and centroid-guided sorting, significantly reducing computational costs while maintaining high performance in large vision-language models.

Contribution

It proposes a novel object-centric token merging paradigm with a segmentation decoder and centroid-guided sorting, improving efficiency and semantic preservation in LVLMs.

Findings

01

State-of-the-art on six benchmarks

02

Maintains 97.4% performance with only 2.2% tokens

03

Dramatic efficiency gains in adaptive-rate settings

Abstract

Large Vision-Language Models (LVLMs) usually suffer from prohibitive computational and memory costs due to the quadratic growth of visual tokens with image resolution. Existing token compression methods, while varied, often lack a high-level semantic understanding, leading to suboptimal merges, information redundancy, or context loss. To address these limitations, we introduce CORE (Compact Object-centric REpresentations), a new paradigm for visual token compression. CORE leverages an efficient segmentation decoder to generate object masks, which serve as a high-level semantic prior to guide the merging of visual tokens into a compact set of object-centric representations. Furthermore, a novel centroid-guided sorting mechanism restores a coherent spatial order to the merged tokens, preserving vital positional information. Extensive experiments show that CORE not only establishes a new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications