CNT: Safety-oriented Function Reuse across LLMs via Cross-Model Neuron Transfer

Yue Zhao; Yujia Gong; Ruigang Liang; Shenchen Zhu; Kai Chen; Xuejing Yuan; Wangjun Zhang

arXiv:2603.18449·cs.CR·March 20, 2026

CNT: Safety-oriented Function Reuse across LLMs via Cross-Model Neuron Transfer

Yue Zhao, Yujia Gong, Ruigang Liang, Shenchen Zhu, Kai Chen, Xuejing Yuan, Wangjun Zhang

PDF

Open Access

TL;DR

This paper introduces Cross-Model Neuron Transfer (CNT), a post-hoc method for reusing safety-related functionalities across large language models by transferring minimal neurons, improving safety adaptation with minimal performance loss.

Contribution

CNT enables modular safety functionality transfer between LLMs at the neuron level, supporting both addition and deletion of safety features in a post-hoc manner.

Findings

01

Achieves safety functionality transfer with less than 1% performance degradation.

02

Outperforms five baseline methods across multiple safety tasks.

03

Demonstrates generality and effectiveness in diverse LLMs.

Abstract

The widespread deployment of large language models (LLMs) calls for post-hoc methods that can flexibly adapt models to evolving safety requirements. Meanwhile, the rapidly expanding open-source LLM ecosystem has produced a diverse collection of models that already exhibit various safety-related functionalities. This motivates a shift from constructing safety functionality from scratch to reusing existing functionality from external models, thereby avoiding costly data collection and training procedures. In this paper, we present Cross-Model Neuron Transfer (CNT), a post-hoc method that reuses safety-oriented functionality by transferring a minimal subset of neurons from an open-source donor LLM to a target LLM. By operating at the neuron level, CNT enables modular function-level adaptation, supporting both function addition andfunction deletion. We evaluate CNT on seven popular LLMs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Adversarial Robustness in Machine Learning · Natural Language Processing Techniques