Multitask Mayhem: Unveiling and Mitigating Safety Gaps in LLMs Fine-tuning
Essa Jan, Nouar AlDahoul, Moiz Ali, Faizan Ahmad, Fareed Zaffar, Yasir, Zaki

TL;DR
This paper investigates how fine-tuning LLMs on various tasks can weaken safety measures, revealing task-specific vulnerabilities and proposing a new multitask safety dataset to improve robustness.
Contribution
It uncovers safety degradation patterns across tasks and introduces a multitask safety dataset to enhance safety robustness in LLMs.
Findings
Fine-tuning for code generation and translation causes the most safety degradation.
LLMs show weaker safety guardrails for translation and classification tasks.
Existing safety solutions lack robustness across different tasks.
Abstract
Recent breakthroughs in Large Language Models (LLMs) have led to their adoption across a wide range of tasks, ranging from code generation to machine translation and sentiment analysis, etc. Red teaming/Safety alignment efforts show that fine-tuning models on benign (non-harmful) data could compromise safety. However, it remains unclear to what extent this phenomenon is influenced by different variables, including fine-tuning task, model calibrations, etc. This paper explores the task-wise safety degradation due to fine-tuning on downstream tasks such as summarization, code generation, translation, and classification across various calibration. Our results reveal that: 1) Fine-tuning LLMs for code generation and translation leads to the highest degradation in safety guardrails. 2) LLMs generally have weaker guardrails for translation and classification, with 73-92% of harmful prompts…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVLSI and Analog Circuit Testing · Smart Grid Security and Resilience · Security and Verification in Computing
