Ontology- and LLM-based Data Harmonization for Federated Learning in Healthcare

Natallia Kokash; Lei Wang; Thomas H. Gillespie; Adam Belloum; Paola Grosso; Sara Quinney; Lang Li; Bernard de Bono

arXiv:2505.20020·cs.LG·May 27, 2025·2 cites

Ontology- and LLM-based Data Harmonization for Federated Learning in Healthcare

Natallia Kokash, Lei Wang, Thomas H. Gillespie, Adam Belloum, Paola Grosso, Sara Quinney, Lang Li, Bernard de Bono

PDF

Open Access

TL;DR

This paper introduces a novel two-step data harmonization approach using ontologies and large language models to improve federated learning in healthcare, addressing data heterogeneity and privacy concerns.

Contribution

It presents an innovative integration of ontologies and LLMs for data alignment, enhancing federated learning in healthcare settings.

Findings

01

Effective semantic mapping of EHR data demonstrated

02

Improved data harmonization in federated learning environments

03

Enhanced privacy-preserving data collaboration

Abstract

The rise of electronic health records (EHRs) has unlocked new opportunities for medical research, but privacy regulations and data heterogeneity remain key barriers to large-scale machine learning. Federated learning (FL) enables collaborative modeling without sharing raw data, yet faces challenges in harmonizing diverse clinical datasets. This paper presents a two-step data alignment strategy integrating ontologies and large language models (LLMs) to support secure, privacy-preserving FL in healthcare, demonstrating its effectiveness in a real-world project involving semantic mapping of EHR data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data