FedSDR: Federated Self-Distillation with Rectification
Ziheng Ren, Zhanming Shen, Hao Wang, Ning Liu, You Song

TL;DR
FedSDR introduces a federated self-distillation framework with rectification, improving large language model fine-tuning by addressing data heterogeneity and hallucinations through dual-stream mechanisms.
Contribution
It proposes FedSDR, a novel federated self-distillation method with rectification, enhancing model fidelity and robustness against data distribution mismatches.
Findings
FedSDR outperforms existing federated learning algorithms.
The dual-stream mechanism effectively reduces hallucinations.
Extensive experiments confirm superior performance of FedSDR.
Abstract
Federated fine-tuning of Large Language Models faces severe statistical heterogeneity. However, existing model-level defenses often overlook the root cause: intrinsic data distribution mismatches. In this work, we first establish Federated Self-Distillation (FedSD) as a fundamental and potent strategy. By projecting client representations into a smoothed ``model-understanding space,'' FedSD alone serves as a universal booster, demonstrating superior performance over conventional algorithms. Despite its success, we identify a subtle trade-off termed the Rewrite Paradox -- unconstrained self-distillation can inadvertently increase hallucinations and redundancy. To refine this paradigm, we further propose FedSDR (Federated Self-Distillation with Rectification), the ultimate reinforced framework. It augments FedSD with a dual-stream mechanism: a local LoRA-S (Smoothing) branch to implicitly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
