SeMe: Training-Free Language Model Merging via Semantic Alignment

Jian Gu; Aldeida Aleti; Chunyang Chen; Hongyu Zhang

arXiv:2505.20144·cs.CL·May 27, 2025

SeMe: Training-Free Language Model Merging via Semantic Alignment

Jian Gu, Aldeida Aleti, Chunyang Chen, Hongyu Zhang

PDF

Open Access

TL;DR

SeMe introduces a training-free, data-free method for merging language models at a semantic level, improving robustness, performance, and interpretability without retraining or external data.

Contribution

SeMe is the first to enable layer-wise, semantic-based merging of LMs without data or training, addressing limitations of prior parameter averaging methods.

Findings

01

Outperforms existing merging techniques in accuracy and efficiency

02

Preserves internal knowledge and model behaviors effectively

03

Works across diverse architectures and tasks

Abstract

Despite the remarkable capabilities of Language Models (LMs) across diverse tasks, no single model consistently outperforms others, necessitating efficient methods to combine their strengths without expensive retraining. Existing model merging techniques, such as parameter averaging and task-guided fusion, often rely on data-dependent computations or fail to preserve internal knowledge, limiting their robustness and scalability. We introduce SeMe (Semantic-based Merging), a novel, data-free, and training-free approach that leverages latent semantic alignment to merge LMs at a fine-grained, layer-wise level. Unlike prior work, SeMe not only preserves model behaviors but also explicitly stabilizes internal knowledge, addressing a critical gap in LM fusion. Through extensive experiments across diverse architectures and tasks, we demonstrate that SeMe outperforms existing methods in both…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques