MoD: A Distribution-Based Approach for Merging Large Language Models

Quy-Anh Dang; Chris Ngo

arXiv:2411.00406·cs.LG·November 4, 2024

MoD: A Distribution-Based Approach for Merging Large Language Models

Quy-Anh Dang, Chris Ngo

PDF

Open Access 2 Repos

TL;DR

The paper introduces MoD, a novel distribution-based method for merging large language models that preserves their specialization and improves performance over traditional weight-averaging techniques.

Contribution

MoD is a new framework that merges LLMs by operating on output distributions, enhancing knowledge sharing while maintaining model specialization.

Findings

01

MoD outperforms existing merging methods on mathematical reasoning benchmarks.

02

It effectively preserves individual model capabilities during merging.

03

Experimental results demonstrate significant performance improvements.

Abstract

Large language models (LLMs) have enabled the development of numerous specialized, task-specific variants. However, the maintenance and deployment of these individual models present substantial challenges in terms of resource utilization and operational efficiency. In this work, we propose the \textit{Mixture of Distributions (MoD)} framework, a novel approach for merging LLMs that operates directly on their output probability distributions, rather than on model weights. Unlike traditional weight-averaging methods, MoD effectively preserves the specialized capabilities of individual models while enabling efficient knowledge sharing across tasks. Through extensive experimentation on mathematical reasoning benchmarks using Qwen2.5 models, we demonstrate that MoD significantly outperforms existing model merging techniques across multiple benchmarks. All code, data, and experimental…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques