# BioRels’ data infrastructure: a scientific schema and exchange standard to transform and enhance biological data sciences

**Authors:** Jibo Wang, Amanda Turney, Lauren Murray, Andrew M Craven, Patty Bragger-Wilkinson, Bruno dos Santos, Jaroslav Martasek, Jeremy Desaphy

PMC · DOI: 10.1093/nar/gkaf254 · Nucleic Acids Research · 2025-04-04

## TL;DR

BioRels is a new data infrastructure that simplifies handling complex biological data, enabling seamless querying and data exchange.

## Contribution

BioRels introduces a standardized, automated data preparation workflow and a new exchange format called BIORJ.

## Key findings

- BioRels handles up to 145 billion data points and supports complex queries across multiple sources.
- The BIORJ format allows data to be exported and imported with dependencies and metadata intact.
- BioRels-KB is proposed as a future expansion to further enhance data preparation capabilities.

## Abstract

Our understanding of biology and medicinal sciences augmented by advances in data structures and algorithms has resulted in proliferation of thousands of open-sourced resources, tools, and websites that are made by the scientific community to access, process, store, and visualize biological data. However, such data have become increasingly complex and heterogeneous, leading to an entangled web of relationships and external identifiers. Despite emergence of infrastructure such as data lakes, the scientists are still responsible for the time consuming and costly exercise to find, extract, clean, prepare, and maintain such data sources while following the FAIR principles. To better understand the complexity, we lay down a representation of the mainstream data ecosystem, describing the natural relationships and concepts found in biology. Built upon it and the fundamental principles of data unicity and atomicity, we introduce BioRels, an automated and standardized data preparation workstream aiming at improving reproducibility and speed for all scientists and handling up to 145 billion data points. BioRels allows complex querying capabilities across several data sources seamlessly and provides an exchange format, BIORJ, to export and import data with all its dependency and metadata. At last, we describe the advantages, limitations, applications, and perspectives of a future approach BioRels-KB to expand future data preparation capabilities.

Graphical Abstract

## Full-text entities

- **Genes:** KRAS (KRAS proto-oncogene, GTPase) [NCBI Gene 3845] {aka 'C-K-RAS, C-K-RAS, CFC2, K-RAS2A, K-RAS2B, K-RAS4A}, CDC73 (cell division cycle 73) [NCBI Gene 79577] {aka C1orf28, FIHP, HPTJT, HRPT1, HRPT2, HYX}, HPRT1 (hypoxanthine phosphoribosyltransferase 1) [NCBI Gene 3251] {aka HGPRT, HPRT}
- **Diseases:** nonsmall cell lung carcinoma (MESH:D002289)
- **Chemicals:** adenosine (MESH:D000241), amino-acid (MESH:D000596), NCT0633591 (-), Sotorasib (MESH:C000706028), inosine (MESH:D007288), guanine (MESH:D006147)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11969666/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11969666/full.md

## References

99 references — full list in the complete paper: https://tomesphere.com/paper/PMC11969666/full.md

---
Source: https://tomesphere.com/paper/PMC11969666