# Asymmetric integration of various cancer datasets for identifying risk-associated variants and genes

**Authors:** Ruixuan Wang, Lam Tran, Benjamin Brennan, Lars G Fritsche, Kevin He, J Chad Brenner, Hui Jiang

PMC · DOI: 10.1093/bioadv/vbaf253 · 2025-10-14

## TL;DR

This paper introduces a new method to combine cancer datasets, improving the identification of genes linked to cancer risk.

## Contribution

The novel asymmetric integration method enhances statistical power by handling data heterogeneity and excluding unhelpful datasets.

## Key findings

- The integrated analysis identified more genetic variants and genes associated with cancer risks at the same false discovery rate.
- The method successfully handles matched case–control study designs using conditional logistic regression models.

## Abstract

Cancer genomic research provides an opportunity to identify cancer risk-associated genes, but often suffers from undesirable low statistical power due to a limited sample size. Integrated analysis with different cancers has the potential to enhance statistical power for identifying pan-cancer risk genes. However, substantial heterogeneity across various cancers makes this challenging.

Recently, a novel asymmetric integration method was developed that can deal with data heterogeneity and exclude unhelpful datasets from the analysis. We adapted and applied this method to integrate genotype datasets with matched case and control individuals from the Michigan Genomics Initiative, using each cancer as the primary dataset of interest and the other cancers as auxiliary datasets, respectively. Conditional logistic regression models were coupled with the asymmetric integrated framework to handle the matched case–control study design and permutation tests were performed to control for false discovery rates (FDRs). At the same FDR level, the integrated analysis found more potential genetic variants and genes that are associated with the risks of various cancers, showcasing the promise of the proposed approach for integrated analysis of cancer datasets.

Our method is available as source code at https://github.com/rxxwang/integrate_cancer.

## Full-text entities

- **Diseases:** Cancer (MESH:D009369)

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12576323/full.md

---
Source: https://tomesphere.com/paper/PMC12576323