LF2L: Loss Fusion Horizontal Federated Learning Across Heterogeneous Feature Spaces Using External Datasets Effectively: A Case Study in Second Primary Cancer Prediction
Chia-Fu Lin, Yi-Ju Tseng

TL;DR
This paper introduces LF2L, a federated learning framework that effectively combines heterogeneous external datasets for improved second primary cancer prediction without sharing sensitive data.
Contribution
The study proposes a novel loss fusion horizontal federated learning approach that handles feature heterogeneity and privacy constraints, enhancing predictive performance in clinical settings.
Findings
Significant AUROC and AUPRC improvements over baselines
Effective integration of external datasets enhances model accuracy
Preserves privacy while enabling cross-institutional collaboration
Abstract
Second primary cancer (SPC), a new cancer in patients different from previously diagnosed, is a growing concern due to improved cancer survival rates. Early prediction of SPC is essential to enable timely clinical interventions. This study focuses on lung cancer survivors treated in Taiwanese hospitals, where the limited size and geographic scope of local datasets restrict the effectiveness and generalizability of traditional machine learning approaches. To address this, we incorporate external data from the publicly available US-based Surveillance, Epidemiology, and End Results (SEER) program, significantly increasing data diversity and scale. However, the integration of multi-source datasets presents challenges such as feature inconsistency and privacy constraints. Rather than naively merging data, we proposed a loss fusion horizontal federated learning (LF2L) framework that can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Machine Learning in Healthcare · Chronic Disease Management Strategies
