Efficiency vs. Alignment: Investigating Safety and Fairness Risks in Parameter-Efficient Fine-Tuning of LLMs

Mina Taraghi; Yann Pequignot; Amin Nikanjam; Mohamed Amine Merzouk; and Foutse Khomh

arXiv:2511.00382·cs.AI·November 4, 2025

Efficiency vs. Alignment: Investigating Safety and Fairness Risks in Parameter-Efficient Fine-Tuning of LLMs

Mina Taraghi, Yann Pequignot, Amin Nikanjam, Mohamed Amine Merzouk, and Foutse Khomh

PDF

Open Access

TL;DR

This study systematically evaluates how different parameter-efficient fine-tuning methods affect safety and fairness in large language models, revealing trade-offs and guiding safer deployment practices.

Contribution

It provides a comprehensive comparison of four PEFT methods across multiple models and metrics, highlighting their distinct impacts on safety and fairness.

Findings

01

Adapter-based methods improve safety and fairness retention.

02

Prompt-based methods tend to decrease safety and increase bias.

03

Base model type moderates alignment shifts and impacts results.

Abstract

Organizations are increasingly adopting and adapting Large Language Models (LLMs) hosted on public repositories such as HuggingFace. Although these adaptations often improve performance on specialized downstream tasks, recent evidence indicates that they can also degrade a model's safety or fairness. Since different fine-tuning techniques may exert distinct effects on these critical dimensions, this study undertakes a systematic assessment of their trade-offs. Four widely used Parameter-Efficient Fine-Tuning methods, LoRA, IA3, Prompt-Tuning, and P-Tuning, are applied to four instruction-tuned model families (Meta-Llama-3-8B, Qwen2.5-7B, Mistral-7B, and Gemma-7B). In total, 235 fine-tuned variants are evaluated across eleven safety hazard categories and nine demographic fairness dimensions. The results show that adapter-based approaches (LoRA, IA3) tend to improve safety scores and are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics and Social Impacts of AI · Artificial Intelligence in Healthcare and Education · Mobile Crowdsensing and Crowdsourcing