FOLDER: Accelerating Multi-modal Large Language Models with Enhanced   Performance

Haicheng Wang; Zhemeng Yu; Gabriele Spadaro; Chen Ju; Victor Qu\'etu,; Shuai Xiao; Enzo Tartaglione

arXiv:2501.02430·cs.CV·April 11, 2025

FOLDER: Accelerating Multi-modal Large Language Models with Enhanced Performance

Haicheng Wang, Zhemeng Yu, Gabriele Spadaro, Chen Ju, Victor Qu\'etu,, Shuai Xiao, Enzo Tartaglione

PDF

Open Access 1 Repo

TL;DR

FOLDER is a plug-and-play module that reduces visual token sequence length in multi-modal large language models, significantly accelerating inference and training while maintaining or improving performance by removing up to 70% of visual tokens.

Contribution

The paper introduces FOLDER, a novel token reduction method that preserves key information and accelerates multi-modal large language models without sacrificing accuracy.

Findings

01

FOLDER reduces visual tokens by up to 70%.

02

Models with FOLDER achieve comparable or better performance.

03

FOLDER accelerates inference and training processes.

Abstract

Recently, Multi-modal Large Language Models (MLLMs) have shown remarkable effectiveness for multi-modal tasks due to their abilities to generate and understand cross-modal data. However, processing long sequences of visual tokens extracted from visual backbones poses a challenge for deployment in real-time applications. To address this issue, we introduce FOLDER, a simple yet effective plug-and-play module designed to reduce the length of the visual token sequence, mitigating both computational and memory demands during training and inference. Through a comprehensive analysis of the token reduction process, we analyze the information loss introduced by different reduction strategies and develop FOLDER to preserve key information while removing visual redundancy. We showcase the effectiveness of FOLDER by integrating it into the visual backbone of several MLLMs, significantly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

anakin-skywalker-joseph/folder
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems