# SPHN Connector - a scalable pipeline for generating validated knowledge graphs from federated and semantically enriched health data

**Authors:** Vasundra Touré, Deepak Unni, Philip Krauss, Andrea Brites Marto, Katie Kalt, Nicola Stoira, Maximilian Pickl, Sabine Österle

PMC · DOI: 10.1186/s12911-026-03383-7 · 2026-02-13

## TL;DR

The SPHN Connector is a tool that helps health institutions create standardized, privacy-protected knowledge graphs from their data, enabling better data sharing and reuse in biomedical research.

## Contribution

The SPHN Connector introduces a scalable, federated pipeline for generating validated, semantically enriched knowledge graphs from decentralized health data sources.

## Key findings

- The SPHN Connector allows institutions to build semantically enriched knowledge graphs locally while maintaining data governance.
- It supports federated data integration, enabling linkage of clinical and omics data from the same patient across different sites.
- The tool facilitates data transformation, de-identification, and validation for compliance with Semantic Web standards.

## Abstract

The integration and reuse of heterogeneous health data, including clinical records, cohort studies, and omics datasets, are essential for advancing modern biomedical research. Knowledge graphs offer a powerful means to semantically link such data, enabling interoperability and reuse. The Swiss Personalized Health Network has developed a comprehensive semantic interoperability framework to implement the FAIR (Findable, Accessible, Interoperable, Reusable) principles at a national level.

This paper presents the strategy adopted and resulting SPHN Connector tool for enabling data providers to transform their local data into semantically enriched knowledge graphs following the RDF and related Semantic Web standards. Rather than requiring centralized data transformation, the SPHN Connector allows each institution to build knowledge graphs locally from their heterogeneous data sources, maintaining data governance at the source while ensuring semantic interoperability across sites.

The SPHN Connector tackles the technical challenges in federated knowledge graph construction. It converts diverse data formats into SPHN-compliant semantically enriched RDF, and offers capabilities for data transformation, de-identification, and validation, particularly for iterative deliveries.

These generated datasets can then either be integrated centrally or used in a federated way, allowing for the linkage of information from the same patient, for example, clinical routine data and omics metadata, as well as the combination of data from different patients across sites.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12980851/full.md

---
Source: https://tomesphere.com/paper/PMC12980851