RanDeS: Randomized Delta Superposition for Multi-Model Compression

Hangyu Zhou; Aaron Gokaslan; Volodymyr Kuleshov; Bharath Hariharan

arXiv:2505.11204·cs.LG·May 19, 2025

RanDeS: Randomized Delta Superposition for Multi-Model Compression

Hangyu Zhou, Aaron Gokaslan, Volodymyr Kuleshov, Bharath Hariharan

PDF

Open Access 1 Repo

TL;DR

This paper introduces RanDeS, a novel multi-model compression method that uses randomized transformations to reduce interference among model deltas, enabling efficient, flexible, and memory-efficient multi-model serving across vision and language tasks.

Contribution

RanDeS reformulates model merging as a compress-and-retrieve scheme and employs random orthogonal transformations to minimize delta interference without extra memory overhead.

Findings

01

Significantly reduces task interference in multi-model merging.

02

Improves performance on vision and language tasks.

03

Supports easy addition/removal of models with minimal compute.

Abstract

From a multi-model compression perspective, model merging enables memory-efficient serving of multiple models fine-tuned from the same base, but suffers from degraded performance due to interference among their task-specific parameter adjustments (i.e., deltas). In this paper, we reformulate model merging as a compress-and-retrieve scheme, revealing that the task interference arises from the summation of irrelevant deltas during model retrieval. To address this issue, we use random orthogonal transformations to decorrelate these vectors into self-cancellation. We show that this approach drastically reduces interference, improving performance across both vision and language tasks. Since these transformations are fully defined by random seeds, adding new models requires no extra memory. Further, their data- and model-agnostic nature enables easy addition or removal of models with minimal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhou-hangyu/randes
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Compression Techniques · Natural Language Processing Techniques · Algorithms and Data Compression