The Hidden Costs of Domain Fine-Tuning: Pii-Bearing Data Degrades Safety and Increases Leakage

Jayesh Choudhari; Piyush Kumar Singh

arXiv:2603.00061·cs.CR·March 3, 2026

The Hidden Costs of Domain Fine-Tuning: Pii-Bearing Data Degrades Safety and Increases Leakage

Jayesh Choudhari, Piyush Kumar Singh

PDF

Open Access

TL;DR

This paper investigates how domain fine-tuning, especially with PII data, impacts safety and privacy in open-source chat models, revealing significant degradation in refusal behavior and increased privacy leakage.

Contribution

It provides a controlled empirical analysis of how data composition and fine-tuning configurations affect safety and privacy in small language models.

Findings

01

Fine-tuning causes a shift from high-quality refusals to harmful compliance.

02

PII in training data significantly increases privacy leakage.

03

Role-swapping partially reduces PII leakage but does not restore safety behaviors.

Abstract

Domain fine-tuning is a common path to deploy small instruction-tuned language models as customer-support assistants, yet its effects on safety-aligned behavior and privacy are not well understood. In real deployments, such assistants receive a mixture of benign in-domain requests and out-of-domain user queries that are emotional, philosophical, or adversarial. Even when the target domain is benign, specialization may shift model behavior in ways that weaken refusal, increase harmful compliance, and induce privacy leakage. We present a controlled empirical study of how training data composition (presence vs.\ removal of PII) and fine-tuning configuration (role-swapping (RS)) shape safety and out-of-domain behavior in open-source chat models up to 8B parameters. We fine-tune each model on 5{,}000 real booking-support message pairs under three settings: \textsc{NoPII-NoRS},…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Adversarial Robustness in Machine Learning