MLoRQ: Bridging Low-Rank and Quantization for Transformer Compression

Ofir Gordon; Ariel Lapid; Elad Cohen; Yarden Yagil; Arnon Netzer; Hai Victor Habi

arXiv:2507.09616·cs.LG·July 15, 2025

MLoRQ: Bridging Low-Rank and Quantization for Transformer Compression

Ofir Gordon, Ariel Lapid, Elad Cohen, Yarden Yagil, Arnon Netzer, Hai Victor Habi

PDF

TL;DR

MLoRQ is a novel method that combines low-rank approximation and mixed-precision quantization to efficiently compress transformer models, achieving state-of-the-art performance improvements on vision tasks.

Contribution

Introduces MLoRQ, a two-stage optimization technique integrating low-rank and quantization methods for transformer compression under memory constraints.

Findings

01

Up to 15% performance improvement on vision transformers.

02

Compatible with most existing quantization algorithms.

03

Effective across image classification, object detection, and segmentation.

Abstract

Deploying transformer-based neural networks on resource-constrained edge devices presents a significant challenge. This challenge is often addressed through various techniques, such as low-rank approximation and mixed-precision quantization. In this work, we introduce Mixed Low-Rank and Quantization (MLoRQ), a novel method that integrates both techniques. MLoRQ employs a two-stage optimization process to determine optimal bit-width and rank assignments for each layer, adhering to predefined memory constraints. This process includes: (i) an intra-layer optimization that identifies potentially optimal compression solutions out of all low-rank and quantization combinations; (ii) an inter-layer optimization that assigns bit-width precision and rank to each layer while ensuring the memory constraint is met. An optional final step applies a sequential optimization process using a modified…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.