Data Collaboration Analysis applied to Compound Datasets and the Introduction of Projection data to Non-IID settings
Akihiro Mizoguchi, Anna Bogdanova, Akira Imakura, and Tetsuya Sakurai

TL;DR
This paper introduces DCPd, an improved distributed machine learning method using projection data, which enhances prediction accuracy for compound datasets in non-IID settings, addressing federated learning limitations.
Contribution
The paper proposes DCPd, a novel data collaboration analysis method utilizing auxiliary PubChem data to improve non-IID compound data predictions.
Findings
DCPd outperforms FedAvg and DC in non-IID settings.
DCPd maintains high accuracy with minimal label bias impact.
Federated learning struggles with non-IID compound data, but DCPd mitigates this.
Abstract
Given the time and expense associated with bringing a drug to market, numerous studies have been conducted to predict the properties of compounds based on their structure using machine learning. Federated learning has been applied to compound datasets to increase their prediction accuracy while safeguarding potentially proprietary information. However, federated learning is encumbered by low accuracy in not identically and independently distributed (non-IID) settings, i.e., data partitioning has a large label bias, and is considered unsuitable for compound datasets, which tend to have large label bias. To address this limitation, we utilized an alternative method of distributed machine learning to chemical compound data from open sources, called data collaboration analysis (DC). We also proposed data collaboration analysis using projection data (DCPd), which is an improved method that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Metabolomics and Mass Spectrometry Studies · Analytical Chemistry and Chromatography
