LFTR: Learning-Free Token Reduction for Multimodal Large Language Models

Zihui Zhao; Yingxin Li; Yang Li

arXiv:2501.17391·cs.CV·October 1, 2025

LFTR: Learning-Free Token Reduction for Multimodal Large Language Models

Zihui Zhao, Yingxin Li, Yang Li

PDF

Open Access

TL;DR

LFTR is a learning-free method that reduces visual tokens in multimodal large language models, significantly decreasing computational load without retraining, and improves efficiency in vision question-answering tasks.

Contribution

LFTR introduces a novel, learning-free token reduction technique that seamlessly integrates into existing MLLMs, reducing tokens and computational costs without additional training.

Findings

01

Achieves up to 16x reduction in visual tokens

02

Maintains or improves performance on vision question-answering benchmarks

03

Complementary to other acceleration methods

Abstract

Multimodal Large Language Models (MLLMs) have demonstrated exceptional success in various multimodal tasks, yet their deployment is frequently limited by substantial computational demands and prolonged inference times. Given that the vision modality typically contains more comprehensive information than the text modality, resulting in encoded representations comprising an extensive number of tokens, leading to significant computational overhead due to the quadratic complexity of the attention mechanism. Current token reduction methods are typically restricted to specific model architectures and often necessitate extensive retraining or fine-tuning, restricting their applicability to many state-of-the-art models. In this paper, we introduce a learning-free token reduction (LFTR) method designed for MLLMs. LFTR can be seamlessly integrated into most open-source MLLM architectures without…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Metallurgy and Material Forming

MethodsFocus