Federated Fine-Tuning of Large Language Models: Kahneman-Tversky vs.   Direct Preference Optimization

Fernando Spadea; Oshani Seneviratne

arXiv:2502.14187·cs.LG·February 21, 2025

Federated Fine-Tuning of Large Language Models: Kahneman-Tversky vs. Direct Preference Optimization

Fernando Spadea, Oshani Seneviratne

PDF

Open Access

TL;DR

This paper compares Kahneman-Tversky Optimization (KTO) and Direct Preference Optimization (DPO) for fine-tuning large language models in federated learning, showing KTO's superior performance and flexibility in various benchmark scenarios.

Contribution

It introduces and evaluates KTO as a new fine-tuning method for federated LLM training, demonstrating its advantages over DPO in diverse settings.

Findings

01

KTO outperforms DPO across all benchmarks.

02

KTO is effective in redistributed datasets where DPO cannot be applied.

03

KTO is robust and scalable for privacy-preserving federated learning.

Abstract

We evaluate Kahneman-Tversky Optimization (KTO) as a fine-tuning method for large language models (LLMs) in federated learning (FL) settings, comparing it against Direct Preference Optimization (DPO). Using Alpaca-7B as the base model, we fine-tune on a realistic dataset under both methods and evaluate performance using MT-Bench-1, Vicuna, and AdvBench benchmarks. Additionally, we introduce a redistributed dataset setup, where only KTO is applicable due to its ability to handle single-response feedback, unlike DPO's reliance on paired responses. Our results demonstrate that KTO, in both its original (KTOO) and redistributed (KTOR) configurations, consistently outperforms DPO across all benchmarks. In the redistributed setup, KTO further validates its flexibility and resilience by maintaining superior performance in scenarios where DPO cannot be applied. These findings establish KTO as a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling

MethodsDirect Preference Optimization · Balanced Selection