HM3: Heterogeneous Multi-Class Model Merging

Stefan Hackmann

arXiv:2409.19173·cs.CL·October 1, 2024

HM3: Heterogeneous Multi-Class Model Merging

Stefan Hackmann

PDF

Open Access

TL;DR

This paper introduces HM3, a training-free method to merge heterogeneous multi-class models into a single model, reducing inference costs and maintaining or improving performance, especially for guard models in language applications.

Contribution

HM3 presents a novel training-free technique for merging multi-class classifiers with different label spaces, simplifying deployment and reducing inference time.

Findings

01

Merged models achieve higher or comparable F1-scores than source models.

02

Inference time is reduced by up to 44%.

03

Self-merging benefits poorly performing classifiers.

Abstract

Foundation language model deployments often include auxiliary guard-rail models to filter or classify text, detecting jailbreak attempts, biased or toxic output, or ensuring topic adherence. These additional models increase the complexity and cost of model inference, especially since many are also large language models. To address this issue, we explore training-free model merging techniques to consolidate these models into a single, multi-functional model. We propose Heterogeneous Multi-Class Model Merging (HM3) as a simple technique for merging multi-class classifiers with heterogeneous label spaces. Unlike parameter-efficient fine-tuning techniques like LoRA, which require extensive training and add complexity during inference, recent advancements allow models to be merged in a training-free manner. We report promising results for merging BERT-based guard models, some of which attain…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis