HM3: Heterogeneous Multi-Class Model Merging
Stefan Hackmann

TL;DR
This paper introduces HM3, a training-free method to merge heterogeneous multi-class models into a single model, reducing inference costs and maintaining or improving performance, especially for guard models in language applications.
Contribution
HM3 presents a novel training-free technique for merging multi-class classifiers with different label spaces, simplifying deployment and reducing inference time.
Findings
Merged models achieve higher or comparable F1-scores than source models.
Inference time is reduced by up to 44%.
Self-merging benefits poorly performing classifiers.
Abstract
Foundation language model deployments often include auxiliary guard-rail models to filter or classify text, detecting jailbreak attempts, biased or toxic output, or ensuring topic adherence. These additional models increase the complexity and cost of model inference, especially since many are also large language models. To address this issue, we explore training-free model merging techniques to consolidate these models into a single, multi-functional model. We propose Heterogeneous Multi-Class Model Merging (HM3) as a simple technique for merging multi-class classifiers with heterogeneous label spaces. Unlike parameter-efficient fine-tuning techniques like LoRA, which require extensive training and add complexity during inference, recent advancements allow models to be merged in a training-free manner. We report promising results for merging BERT-based guard models, some of which attain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis
