RQ-MoE: Residual Quantization via Mixture of Experts for Efficient Input-Dependent Vector Compression

Zhengjia Zhong; Shuyan Ke; Zaizhou Lin; Jiaqi Song; Hongyi Lan; and Hui Li

arXiv:2605.14359·cs.LG·May 15, 2026

RQ-MoE: Residual Quantization via Mixture of Experts for Efficient Input-Dependent Vector Compression

Zhengjia Zhong, Shuyan Ke, Zaizhou Lin, Jiaqi Song, Hongyi Lan, and Hui Li

PDF

1 Repo

TL;DR

The paper introduces RQ-MoE, a novel vector quantization framework that uses a mixture of experts to enable input-dependent codebook adaptation, resulting in faster decoding and improved performance.

Contribution

It proposes a new residual quantization method combining a two-level MoE with dual-stream quantization, allowing dynamic codebook construction and parallel decoding.

Findings

01

Achieves state-of-the-art or comparable performance in reconstruction and retrieval tasks.

02

Provides 6x-14x faster decoding than previous vector quantization methods.

03

Theoretically unifies standard Residual Quantization and QINCo as special cases.

Abstract

Vector quantization is a fundamental tool for compressing high-dimensional embeddings, yet existing multi-codebook methods rely on static codebooks that limit expressiveness under heterogeneous data geometry. While recent dynamic quantizers like QINCo adapt codebooks to individual inputs and improve expressiveness, their strict sequential dependencies create decoding bottlenecks. We propose Residual Quantization via Mixture of Experts (RQ-MoE), a framework combining a two-level MoE with dual-stream quantization to enable input-dependent codebook adaptation for efficient vector quantization. RQ-MoE enables dynamic codebook construction and decouples instruction from quantization, facilitating parallel decoding. Theoretically, we show that standard Residual Quantization and QINCo can be recovered as constrained special cases of RQ-MoE, and derive a guideline for setting expert…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

KDEGroup/RQ-MoE
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.