SeMe: Training-Free Language Model Merging via Semantic Alignment
Jian Gu, Aldeida Aleti, Chunyang Chen, Hongyu Zhang

TL;DR
SeMe introduces a training-free, data-free method for merging language models at a semantic level, improving robustness, performance, and interpretability without retraining or external data.
Contribution
SeMe is the first to enable layer-wise, semantic-based merging of LMs without data or training, addressing limitations of prior parameter averaging methods.
Findings
Outperforms existing merging techniques in accuracy and efficiency
Preserves internal knowledge and model behaviors effectively
Works across diverse architectures and tasks
Abstract
Despite the remarkable capabilities of Language Models (LMs) across diverse tasks, no single model consistently outperforms others, necessitating efficient methods to combine their strengths without expensive retraining. Existing model merging techniques, such as parameter averaging and task-guided fusion, often rely on data-dependent computations or fail to preserve internal knowledge, limiting their robustness and scalability. We introduce SeMe (Semantic-based Merging), a novel, data-free, and training-free approach that leverages latent semantic alignment to merge LMs at a fine-grained, layer-wise level. Unlike prior work, SeMe not only preserves model behaviors but also explicitly stabilizes internal knowledge, addressing a critical gap in LM fusion. Through extensive experiments across diverse architectures and tasks, we demonstrate that SeMe outperforms existing methods in both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
