RanDeS: Randomized Delta Superposition for Multi-Model Compression
Hangyu Zhou, Aaron Gokaslan, Volodymyr Kuleshov, Bharath Hariharan

TL;DR
This paper introduces RanDeS, a novel multi-model compression method that uses randomized transformations to reduce interference among model deltas, enabling efficient, flexible, and memory-efficient multi-model serving across vision and language tasks.
Contribution
RanDeS reformulates model merging as a compress-and-retrieve scheme and employs random orthogonal transformations to minimize delta interference without extra memory overhead.
Findings
Significantly reduces task interference in multi-model merging.
Improves performance on vision and language tasks.
Supports easy addition/removal of models with minimal compute.
Abstract
From a multi-model compression perspective, model merging enables memory-efficient serving of multiple models fine-tuned from the same base, but suffers from degraded performance due to interference among their task-specific parameter adjustments (i.e., deltas). In this paper, we reformulate model merging as a compress-and-retrieve scheme, revealing that the task interference arises from the summation of irrelevant deltas during model retrieval. To address this issue, we use random orthogonal transformations to decorrelate these vectors into self-cancellation. We show that this approach drastically reduces interference, improving performance across both vision and language tasks. Since these transformations are fully defined by random seeds, adding new models requires no extra memory. Further, their data- and model-agnostic nature enables easy addition or removal of models with minimal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Natural Language Processing Techniques · Algorithms and Data Compression
