Optimizing Cross-Client Domain Coverage for Federated Instruction Tuning of Large Language Models
Zezhou Wang, Yaxin Du, Xingjun Ma, Yugang Jiang, Zhuzhong Qian, Siheng Chen

TL;DR
This paper identifies cross-client domain coverage as crucial for federated instruction tuning of large language models and introduces FedDCA, an algorithm that enhances coverage and performance through diversity-driven client selection and augmentation.
Contribution
The paper reveals the importance of domain coverage over data heterogeneity and proposes FedDCA, a novel method to maximize coverage for improved federated LLM instruction tuning.
Findings
FedDCA outperforms 11 baselines with up to 29.19% performance gains.
It increases domain coverage by 4.82% to 21.36%.
Effective in scenarios with limited or heterogeneous data.
Abstract
Federated domain-specific instruction tuning (FedDIT) for large language models (LLMs) aims to enhance performance in specialized domains using distributed private and limited data, yet identifying key performance drivers and optimal augmentation strategies remains challenging. We empirically establish that cross-client domain coverage, rather than data heterogeneity, is the pivotal factor. We then introduce FedDCA, an algorithm that explicitly maximizes this coverage through diversity-oriented client center selection and retrieval-based augmentation, constructing diverse, non-redundant cross-client instruction sets. Extensive experiments across multiple domains demonstrate FedDCA's superiority over eleven baselines, achieving performance gains of up to 29.19\% and domain coverage improvements of 4.82\%-21.36\%. FedDCA maintains its effectiveness in diverse and challenging scenarios,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvancements in Photolithography Techniques
MethodsFocus
