Enhancing Data Quality in Federated Fine-Tuning of Foundation Models

Wanru Zhao; Yaxin Du; Nicholas Donald Lane; Siheng Chen; Yanfeng Wang

arXiv:2403.04529·cs.LG·March 8, 2024·1 cites

Enhancing Data Quality in Federated Fine-Tuning of Foundation Models

Wanru Zhao, Yaxin Du, Nicholas Donald Lane, Siheng Chen, Yanfeng Wang

PDF

Open Access

TL;DR

This paper introduces a data quality control pipeline for federated fine-tuning of foundation models, enabling collaboration across private data sources while maintaining data privacy and improving overall model performance.

Contribution

It proposes a novel data quality scoring and thresholding method tailored for federated learning, enhancing data selection and model training effectiveness.

Findings

01

Improved model performance with quality-controlled data

02

Enhanced reliability of federated fine-tuning process

03

Effective data scoring and thresholding mechanism

Abstract

In the current landscape of foundation model training, there is a significant reliance on public domain data, which is nearing exhaustion according to recent research. To further scale up, it is crucial to incorporate collaboration among multiple specialized and high-quality private domain data sources. However, the challenge of training models locally without sharing private data presents numerous obstacles in data quality control. To tackle this issue, we propose a data quality control pipeline for federated fine-tuning of foundation models. This pipeline computes scores reflecting the quality of training data and determines a global threshold for a unified standard, aiming for improved global performance. Our experiments show that the proposed quality control pipeline facilitates the effectiveness and reliability of the model training, leading to better performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGeological Modeling and Analysis · 3D Modeling in Geospatial Applications