Optimizing Cross-Client Domain Coverage for Federated Instruction Tuning of Large Language Models

Zezhou Wang; Yaxin Du; Xingjun Ma; Yugang Jiang; Zhuzhong Qian; Siheng Chen

arXiv:2409.20135·cs.LG·August 22, 2025

Optimizing Cross-Client Domain Coverage for Federated Instruction Tuning of Large Language Models

Zezhou Wang, Yaxin Du, Xingjun Ma, Yugang Jiang, Zhuzhong Qian, Siheng Chen

PDF

Open Access 1 Video

TL;DR

This paper identifies cross-client domain coverage as crucial for federated instruction tuning of large language models and introduces FedDCA, an algorithm that enhances coverage and performance through diversity-driven client selection and augmentation.

Contribution

The paper reveals the importance of domain coverage over data heterogeneity and proposes FedDCA, a novel method to maximize coverage for improved federated LLM instruction tuning.

Findings

01

FedDCA outperforms 11 baselines with up to 29.19% performance gains.

02

It increases domain coverage by 4.82% to 21.36%.

03

Effective in scenarios with limited or heterogeneous data.

Abstract

Federated domain-specific instruction tuning (FedDIT) for large language models (LLMs) aims to enhance performance in specialized domains using distributed private and limited data, yet identifying key performance drivers and optimal augmentation strategies remains challenging. We empirically establish that cross-client domain coverage, rather than data heterogeneity, is the pivotal factor. We then introduce FedDCA, an algorithm that explicitly maximizes this coverage through diversity-oriented client center selection and retrieval-based augmentation, constructing diverse, non-redundant cross-client instruction sets. Extensive experiments across multiple domains demonstrate FedDCA's superiority over eleven baselines, achieving performance gains of up to 29.19\% and domain coverage improvements of 4.82\%-21.36\%. FedDCA maintains its effectiveness in diverse and challenging scenarios,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Optimizing Cross-Client Domain Coverage for Federated Instruction Tuning of Large Language Models· underline

Taxonomy

TopicsAdvancements in Photolithography Techniques

MethodsFocus