Multi-Modal Dataset Creation for Federated Learning with DICOM Structured Reports
Malte T\"olle, Lukas Burger, Halvar Kelm, Florian Andr\'e, Peter, Bannas, Gerhard Diller, Norbert Frey, Philipp Garthe, Stefan Gro{\ss}, Anja, Hennemuth, Lars Kaderali, Nina Kr\"uger, Andreas Leha, Simon Martin,, Alexander Meyer, Eike Nagel, Stefan Orwat, Clemens Scherer

TL;DR
This paper presents an open platform leveraging DICOM structured reports to create harmonized multi-modal datasets for federated learning, addressing data heterogeneity across multiple hospitals.
Contribution
It introduces a novel platform that standardizes and filters diverse data types for federated learning using DICOM structured reports, applicable across different data modalities.
Findings
Successfully applied to diverse data types across eight hospitals.
Enabled concurrent filtering and harmonization of multi-modal datasets.
Facilitated predictive modeling for heart valve replacement outcomes.
Abstract
Purpose: Federated training is often hindered by heterogeneous datasets due to divergent data storage options, inconsistent naming schemes, varied annotation procedures, and disparities in label quality. This is particularly evident in the emerging multi-modal learning paradigms, where dataset harmonization including a uniform data representation and filtering options are of paramount importance. Methods: DICOM structured reports enable the standardized linkage of arbitrary information beyond the imaging domain and can be used within Python deep learning pipelines with highdicom. Building on this, we developed an open platform for data integration and interactive filtering capabilities that simplifies the process of assembling multi-modal datasets. Results: In this study, we extend our prior work by showing its applicability to more and divergent data types, as well as streamlining…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Privacy-Preserving Technologies in Data
