Toward Secure Tuning: Mitigating Security Risks from Instruction Fine-Tuning
Yanrui Du, Sendong Zhao, Jiawei Cao, Ming Ma, Danyang Zhao, Shuren Qi,, Fenglei Fan, Ting Liu, Bing Qin

TL;DR
This paper introduces SWAT, a secure-tuning strategy for instruction fine-tuning of LLMs that reduces security risks by focusing on robust modules, while maintaining task performance and integrating with existing methods.
Contribution
The paper proposes a novel in-training secure-tuning method called SWAT that enhances security by module analysis and can be combined with pre- and post-training defenses.
Findings
SWAT effectively mitigates security risks across various datasets and models.
SWAT maintains high task performance while improving security.
The method is compatible with existing pre- and post-training defenses.
Abstract
Instruction fine-tuning has emerged as a critical technique for customizing Large Language Models (LLMs) to specific applications. However, recent studies have highlighted significant security vulnerabilities in fine-tuned LLMs. Existing defense efforts focus more on pre-training and post-training methods, yet there remains underexplored in in-training methods. To fill this gap, we introduce a novel secure-tuning strategy called SWAT. By analyzing how module-level parameters (e.g. Q/K/V/O) affect the security feature space drift, we identify a robust subset of modules, termed Mods_Rob. Our SWAT strategy begins by warming up Mods_Rob to capture low-level features with minimal security risks, followed by training all parameters to achieve optimal task performance. Essentially, this strategy shifts the early learning burden more from global parameters to Mods_Rob, reducing update…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic Technology and Sound Studies
MethodsBalanced Selection
