Objective Matters: Fine-Tuning Objectives Shape Safety, Robustness, and Persona Drift

Daniel Vennemeyer; Punya Syon Pandey; Phan Anh Duong; Michael Umeokoli; Samuel Ratnam

arXiv:2601.12639·cs.CL·January 21, 2026

Objective Matters: Fine-Tuning Objectives Shape Safety, Robustness, and Persona Drift

Daniel Vennemeyer, Punya Syon Pandey, Phan Anh Duong, Michael Umeokoli, Samuel Ratnam

PDF

Open Access

TL;DR

This paper systematically compares six fine-tuning objectives for large language models, revealing how they influence safety, robustness, and persona stability, especially at larger training scales.

Contribution

It provides a controlled analysis of how different fine-tuning objectives affect safety and robustness, highlighting the importance of objective choice at larger scales.

Findings

01

Supervised and preference-based tuning increase vulnerability and persona drift at large scales.

02

Objectives that constrain learning signals mitigate adversarial vulnerability and persona drift.

03

Safety is less affected by objectives at small training scales.

Abstract

Fine-tuning LLMs on benign data can still degrade alignment and adversarial robustness, yet direct analysis of the role of fine-tuning objectives in shaping these safety outcomes remain limited. We present a controlled comparison of six fine-tuning objectives -- Supervised Fine-Tuning, Direct Preference Optimization, Conditional Fine-Tuning, Inoculation Prompting, Odds Ratio Preference Optimization, and KL-regularized fine-tuning -- holding data, domain, architecture, and optimization fixed. Across closed-form reasoning and open-ended generation tasks, we find that objective choice induces systematic, scale-dependent shifts along the safety-capability frontier. At small training budgets, robustness is similar across objectives but capability differs. At larger budgets, objectives diverge sharply: supervised and preference-based tuning tightly couple capability gains to increased…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics and Social Impacts of AI · Adversarial Robustness in Machine Learning · Persona Design and Applications