Tangent Space Fine-Tuning for Directional Preference Alignment in Large Language Models
Mete Erdogan

TL;DR
This paper introduces Tangent-Space DPO, a novel method for aligning large language models with multiple human preferences by performing preference optimization in a model's tangent space, enabling controllable and multi-objective fine-tuning.
Contribution
The paper extends tangent space fine-tuning to preference alignment, proposing TS-DPO for multi-objective control without scalarization, improving Pareto coverage and disentanglement.
Findings
TS-DPO achieves broader Pareto-optimal coverage.
TS-DPO enables smoother preference control.
Tangent-space training enhances preference disentanglement.
Abstract
Our goal is to enable large language models (LLMs) to balance multiple human preference dimensions; such as helpfulness, safety, and verbosity, through principled and controllable alignment. Existing preference optimization methods, including Direct Preference Optimization (DPO), collapse feedback into a single scalar reward, fixing one balance among objectives and preventing traversal of the Pareto front. Recent work by Ortiz-Jimenez et al. (2023) showed that fine-tuning can be viewed in a model's tangent space, where linearized updates act as additive vectors that can be composed to jointly perform well on multiple tasks. Building on this formulation, we extend this idea to preference alignment and propose Tangent-Space Direct Preference Optimization (TS-DPO), which performs DPO within this locally linear regime to learn per-objective update directions. These directions can be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Recommender Systems and Techniques · Multimodal Machine Learning Applications
