# pyBiodatafuse: extending interoperability of data using modular queries across biomedical resources

**Authors:** Yojana Gadiya, Javier Millán Acosta, Ammar Ammar, Alejandro Adriaque Lozano, Delano Wetstede, Dominik Martinát, Ana Claudia Sima, Hailiang Mei, Egon Willighagen, Tooba Abbassi-Daloii

PMC · DOI: 10.1093/bioinformatics/btag064 · Bioinformatics · 2026-02-15

## TL;DR

pyBiodatafuse is a Python tool that simplifies integrating biomedical data into customizable knowledge graphs, streamlining research workflows.

## Contribution

pyBiodatafuse introduces a modular framework for dynamically generating context-specific knowledge graphs from biomedical data.

## Key findings

- pyBiodatafuse enables on-the-fly creation of knowledge graphs from gene or metabolite identifiers.
- The tool supports integration with platforms like Cytoscape and Neo4j for flexible analysis.
- A post-COVID syndrome knowledge graph was successfully generated using differential gene expression data.

## Abstract

Integrating omics data analysis with publicly available databases is crucial for unravelling complex biological mechanisms. However, this integration process is often intricate and time-consuming due to the diversity and complexity of the data involved. Achieving consistent harmonization across data types is challenging when managing disparate formats and sources. To address these issues, we introduce pyBiodatafuse, a query-based Python tool designed to integrate biomedical databases. This tool establishes a modular framework that simplifies data wrangling, enabling the creation of context-specific knowledge graphs (KGs) while supporting graph-based analyses.

We developed a pipeline for generating context-specific knowledge graphs dynamically, allowing users to create KGs on the fly from a set of gene or metabolite identifiers. pyBiodatafuse features a user-friendly interface that streamlines this process, making it accessible even to researchers without extensive computational expertise. Additionally, the tool offers plugins for widely used platforms such as Cytoscape, Neo4j, and GraphDB, enabling local hosting of resulting property and RDF graphs. This versatility ensures that generated KGs can be efficiently utilized within diverse research workflows. To demonstrate its potential, we used pyBiodatafuse to create a graph for post-COVID syndrome using differential gene expression data, showcasing its ability to build adaptable and context-specific knowledge representations. Thus, pyBiodatafuse sets the stage for streamlined data integration, empowering researchers to focus on discovery and analysis without being hindered by data management complexities.

pyBiodatafuse is open-source, with its source code and PyPi package available at https://github.com/BioDataFuse/pyBiodatafuse and https://pypi.org/project/pyBiodatafuse/. The user interface can be accessed at https://biodatafuse.org/. Additionally, a release has been made on Zenodo at https://doi.org/10.5281/zenodo.18468942.

## Full-text entities

- **Diseases:** fatigue (MESH:D005221), weakness (MESH:D018908), Long COVID (MESH:D000094024), myalgia (MESH:D063806), COVID (MESH:D000086382), toxicity (MESH:D064420), hallucinations (MESH:D006212)
- **Species:** Mus musculus (house mouse, species) [taxon 10090], Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12949461/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12949461/full.md

## References

52 references — full list in the complete paper: https://tomesphere.com/paper/PMC12949461/full.md

---
Source: https://tomesphere.com/paper/PMC12949461