Fourier Compressor: Frequency-Domain Visual Token Compression for Vision-Language Models

Huanyu Wang; Jushi Kai; Haoli Bai; Lu Hou; Bo Jiang; Ziwei He; Zhouhan Lin

arXiv:2508.06038·cs.CV·May 19, 2026

Fourier Compressor: Frequency-Domain Visual Token Compression for Vision-Language Models

Huanyu Wang, Jushi Kai, Haoli Bai, Lu Hou, Bo Jiang, Ziwei He, Zhouhan Lin

PDF

7 Models

TL;DR

Fourier Compressor is a novel, parameter-free frequency-domain module that significantly reduces computational costs in vision-language models while maintaining high accuracy, applicable to images and videos.

Contribution

It introduces a frequency-domain visual token compression method that outperforms existing parameter-free approaches and generalizes across multiple architectures and tasks.

Findings

01

Retains over 96% of original accuracy with up to 83.8% FLOPs reduction.

02

Boosts generation speed by 31.2%.

03

Outperforms existing parameter-free methods and surpasses some parameterized approaches.

Abstract

Vision-Language Models (VLMs) incur substantial computational overhead and inference latency due to the large number of vision tokens introduced by high-resolution image and video inputs. Existing parameter-free token compression methods typically rely on token selection or merging, yet they risk discarding substantial visual information or distorting the original representation distribution, resulting in pronounced performance degradation at high compression ratios. In response, we aim to explore a more effective and efficient visual token compression strategy, with a promising direction in the frequency domain. Motivated by the success of frequency-domain transforms in image compression (e.g., JPEG), we systematically analyze the frequency redundancy in visual representations and uncover a non-uniform distribution of semantic information across frequency bands. Building upon this, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis