NeuronTune: Fine-Grained Neuron Modulation for Balanced Safety-Utility Alignment in LLMs

Birong Pan; Mayi Xu; Qiankun Pi; Jianhao Chen; Yuanyuan Zhu; Ming Zhong; Tieyun Qian

arXiv:2508.09473·cs.LG·August 14, 2025

NeuronTune: Fine-Grained Neuron Modulation for Balanced Safety-Utility Alignment in LLMs

Birong Pan, Mayi Xu, Qiankun Pi, Jianhao Chen, Yuanyuan Zhu, Ming Zhong, Tieyun Qian

PDF

TL;DR

NeuronTune introduces a fine-grained neuron modulation framework for LLMs that enhances safety without sacrificing utility, outperforming existing methods through dynamic, attribution-based neuron adjustments.

Contribution

It presents a novel, fine-grained neuron modulation approach using attribution and meta-learning to improve safety-utility balance in LLMs, addressing limitations of coarse interventions.

Findings

01

Significantly improves safety in LLMs against malicious attacks.

02

Maintains high utility and task performance.

03

Outperforms state-of-the-art safety and utility methods.

Abstract

Ensuring robust safety alignment while preserving utility is critical for the reliable deployment of Large Language Models (LLMs). However, current techniques fundamentally suffer from intertwined deficiencies: insufficient robustness against malicious attacks, frequent refusal of benign queries, degradation in generated text quality and general task performance--the former two reflecting deficits in robust safety and the latter constituting utility impairment. We trace these limitations to the coarse-grained layer-wise interventions in existing methods. To resolve this, we propose NeuronTune, a fine-grained framework that dynamically modulates sparse neurons to achieve simultaneous safety-utility optimization. Our approach first identifies safety-critical and utility-preserving neurons across all layers via attribution, then employs meta-learning to adaptively amplify safety-neuron…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.