SPHN Connector - a scalable pipeline for generating validated knowledge graphs from federated and semantically enriched health data
Vasundra Touré, Deepak Unni, Philip Krauss, Andrea Brites Marto, Katie Kalt, Nicola Stoira, Maximilian Pickl, Sabine Österle

TL;DR
The SPHN Connector is a tool that helps health institutions create standardized, privacy-protected knowledge graphs from their data, enabling better data sharing and reuse in biomedical research.
Contribution
The SPHN Connector introduces a scalable, federated pipeline for generating validated, semantically enriched knowledge graphs from decentralized health data sources.
Findings
The SPHN Connector allows institutions to build semantically enriched knowledge graphs locally while maintaining data governance.
It supports federated data integration, enabling linkage of clinical and omics data from the same patient across different sites.
The tool facilitates data transformation, de-identification, and validation for compliance with Semantic Web standards.
Abstract
The integration and reuse of heterogeneous health data, including clinical records, cohort studies, and omics datasets, are essential for advancing modern biomedical research. Knowledge graphs offer a powerful means to semantically link such data, enabling interoperability and reuse. The Swiss Personalized Health Network has developed a comprehensive semantic interoperability framework to implement the FAIR (Findable, Accessible, Interoperable, Reusable) principles at a national level. This paper presents the strategy adopted and resulting SPHN Connector tool for enabling data providers to transform their local data into semantically enriched knowledge graphs following the RDF and related Semantic Web standards. Rather than requiring centralized data transformation, the SPHN Connector allows each institution to build knowledge graphs locally from their heterogeneous data sources,…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsResearch Data Management Practices · Biomedical Text Mining and Ontologies · Semantic Web and Ontologies
