Federated Fine-Tuning of Large Language Models: Kahneman-Tversky vs. Direct Preference Optimization
Fernando Spadea, Oshani Seneviratne

TL;DR
This paper compares Kahneman-Tversky Optimization (KTO) and Direct Preference Optimization (DPO) for fine-tuning large language models in federated learning, showing KTO's superior performance and flexibility in various benchmark scenarios.
Contribution
It introduces and evaluates KTO as a new fine-tuning method for federated LLM training, demonstrating its advantages over DPO in diverse settings.
Findings
KTO outperforms DPO across all benchmarks.
KTO is effective in redistributed datasets where DPO cannot be applied.
KTO is robust and scalable for privacy-preserving federated learning.
Abstract
We evaluate Kahneman-Tversky Optimization (KTO) as a fine-tuning method for large language models (LLMs) in federated learning (FL) settings, comparing it against Direct Preference Optimization (DPO). Using Alpaca-7B as the base model, we fine-tune on a realistic dataset under both methods and evaluate performance using MT-Bench-1, Vicuna, and AdvBench benchmarks. Additionally, we introduce a redistributed dataset setup, where only KTO is applicable due to its ability to handle single-response feedback, unlike DPO's reliance on paired responses. Our results demonstrate that KTO, in both its original (KTOO) and redistributed (KTOR) configurations, consistently outperforms DPO across all benchmarks. In the redistributed setup, KTO further validates its flexibility and resilience by maintaining superior performance in scenarios where DPO cannot be applied. These findings establish KTO as a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
MethodsDirect Preference Optimization · Balanced Selection
