Fine-Tuning, Quantization, and LLMs: Navigating Unintended Outcomes

Divyanshu Kumar; Anurakt Kumar; Sahil Agarwal; Prashanth Harshangi

arXiv:2404.04392·cs.CR·September 10, 2024·2 cites

Fine-Tuning, Quantization, and LLMs: Navigating Unintended Outcomes

Divyanshu Kumar, Anurakt Kumar, Sahil Agarwal, Prashanth Harshangi

PDF

Open Access

TL;DR

This paper examines how fine-tuning and quantization affect the safety vulnerabilities of large language models, revealing that fine-tuning often increases jailbreak success, while guardrails improve safety.

Contribution

It provides a comprehensive analysis of the impact of fine-tuning and quantization on LLM safety, offering insights for developing more robust safety measures.

Findings

01

Fine-tuning generally increases jailbreak attack success rates.

02

Quantization has mixed effects on model vulnerability.

03

Implementing guardrails significantly improves resistance to jailbreaks.

Abstract

Large Language Models (LLMs) have gained widespread adoption across various domains, including chatbots and auto-task completion agents. However, these models are susceptible to safety vulnerabilities such as jailbreaking, prompt injection, and privacy leakage attacks. These vulnerabilities can lead to the generation of malicious content, unauthorized actions, or the disclosure of confidential information. While foundational LLMs undergo alignment training and incorporate safety measures, they are often subject to fine-tuning, or doing quantization resource-constrained environments. This study investigates the impact of these modifications on LLM safety, a critical consideration for building reliable and secure AI systems. We evaluate foundational models including Mistral, Llama series, Qwen, and MosaicML, along with their fine-tuned variants. Our comprehensive analysis reveals that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCryptography and Data Security · Security and Verification in Computing

MethodsLLaMA