Towards the Next Frontier of LLMs, Training on Private Data: A Cross-Domain Benchmark for Federated Fine-Tuning
Daniel M. Jimenez-Gutierrez, Enrique Zuazua, Georgios Kellaris, Joaquin del Rio, Oleksii Sliusarenko, Xabi Uribe-Etxebarria

TL;DR
This paper presents a federated learning framework for fine-tuning large language models on private, non-IID institutional data in healthcare and finance, demonstrating near-centralized performance with efficient parameter-efficient strategies.
Contribution
It introduces a practical federated fine-tuning approach using PEFT methods on private data, enabling LLM adaptation without data sharing across institutions.
Findings
Federated fine-tuning approaches perform close to centralized training.
PEFT methods like QLoRA and IA3 improve efficiency with minimal accuracy loss.
The framework effectively handles non-IID data across different domains.
Abstract
The recent success of large language models (LLMs) has been largely driven by vast public datasets. However, the next frontier for LLM development lies beyond public data. Much of the world's most valuable information is private, especially in highly regulated sectors such as healthcare and finance, where data include patient histories or customer communications. Unlocking this data could represent a major leap forward, enabling LLMs with deeper domain expertise and stronger real-world utility. Yet, these data cannot be shared because they are distributed across institutions and constrained by privacy, regulatory, and organizational barriers. Moreover, institutional datasets are typically non-independent and identically distributed (non-IID), differing across sites in population characteristics, data modalities, documentation patterns, and task-specific label distributions. In this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
