# Oncopacket: integration of cancer research data using GA4GH phenopackets

**Authors:** Michael Sierk, Daniel Danis, Sujay Patil, Nobal Kishor, Rajdeep Mondal, Abhishek Jha, Qingrong Chen, Chunhua Yan, Monica Munoz-Torres, Daoud Meerzaman, Peter N Robinson, Justin T Reese

PMC · DOI: 10.1093/bioinformatics/btaf546 · Bioinformatics · 2025-09-29

## TL;DR

Oncopacket is a tool that integrates cancer data into a standard format, enabling better analysis and reuse of clinical and genetic information.

## Contribution

The paper introduces a software package that harmonizes cancer data using the GA4GH Phenopacket standard.

## Key findings

- Oncopacket integrates demographic, mutation, and clinical data for 12 cancer types.
- The tool successfully recapitulates a known association between IDH1 mutations and survival in brain cancer patients.
- The GA4GH Phenopacket schema provides a foundation for advanced statistical and AI/ML analyses.

## Abstract

Lack of data integration remains a significant impediment to cancer research, and many analyses still require customized software to transform and prepare cancer data. We describe a software package to harmonize genetic and clinical cancer data into the GA4GH Phenopacket schema, an ISO standard for representing clinical case data. We integrated demographic, mutation, morphology, diagnosis, intervention, and survival data using case data from the National Cancer Institute for 12 cancer types. The Phenopacket standard provides a foundation for downstream use, including sophisticated statistical and AI/ML analyses. We demonstrate fitness for purpose by using the integrated data to recapitulate a known association between mutations in the gene encoding isocitrate dehydrogenase 1 and survival time in brain cancer patients.

Source code is freely available at: https://github.com/monarch-initiative/oncopacket (archived at 10.5281/zenodo.15353125).

## Linked entities

- **Genes:** IDH1 (isocitrate dehydrogenase 1) [NCBI Gene 829679]
- **Diseases:** brain cancer (MONDO:0001657)

## Full-text entities

- **Genes:** IDH1 (isocitrate dehydrogenase (NADP(+)) 1) [NCBI Gene 3417] {aka HEL-216, HEL-S-26, IDCD, IDH, IDP, IDPC}
- **Diseases:** brain cancer (MESH:D001932), Cancer (MESH:D009369)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12516310/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12516310/full.md

## References

24 references — full list in the complete paper: https://tomesphere.com/paper/PMC12516310/full.md

---
Source: https://tomesphere.com/paper/PMC12516310