Multitask Mayhem: Unveiling and Mitigating Safety Gaps in LLMs   Fine-tuning

Essa Jan; Nouar AlDahoul; Moiz Ali; Faizan Ahmad; Fareed Zaffar; Yasir; Zaki

arXiv:2409.15361·cs.CL·September 25, 2024

Multitask Mayhem: Unveiling and Mitigating Safety Gaps in LLMs Fine-tuning

Essa Jan, Nouar AlDahoul, Moiz Ali, Faizan Ahmad, Fareed Zaffar, Yasir, Zaki

PDF

Open Access

TL;DR

This paper investigates how fine-tuning LLMs on various tasks can weaken safety measures, revealing task-specific vulnerabilities and proposing a new multitask safety dataset to improve robustness.

Contribution

It uncovers safety degradation patterns across tasks and introduces a multitask safety dataset to enhance safety robustness in LLMs.

Findings

01

Fine-tuning for code generation and translation causes the most safety degradation.

02

LLMs show weaker safety guardrails for translation and classification tasks.

03

Existing safety solutions lack robustness across different tasks.

Abstract

Recent breakthroughs in Large Language Models (LLMs) have led to their adoption across a wide range of tasks, ranging from code generation to machine translation and sentiment analysis, etc. Red teaming/Safety alignment efforts show that fine-tuning models on benign (non-harmful) data could compromise safety. However, it remains unclear to what extent this phenomenon is influenced by different variables, including fine-tuning task, model calibrations, etc. This paper explores the task-wise safety degradation due to fine-tuning on downstream tasks such as summarization, code generation, translation, and classification across various calibration. Our results reveal that: 1) Fine-tuning LLMs for code generation and translation leads to the highest degradation in safety guardrails. 2) LLMs generally have weaker guardrails for translation and classification, with 73-92% of harmful prompts…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVLSI and Analog Circuit Testing · Smart Grid Security and Resilience · Security and Verification in Computing