# GRACKLE: an interpretable matrix factorization approach for biomedical representation learning

**Authors:** Lucas A Gillenwater, Lawrence E Hunter, James C Costello

PMC · DOI: 10.1093/bioinformatics/btaf213 · 2025-07-15

## TL;DR

GRACKLE is a new method that uses biological knowledge to improve gene pattern discovery in diseases, especially when data is limited.

## Contribution

GRACKLE introduces a joint integration of sample similarity and gene similarity with prior biological knowledge in matrix factorization.

## Key findings

- GRACKLE outperformed other NMF algorithms in simulations with increased background noise.
- GRACKLE identified condition-enriched subgroups in breast tumors and Down syndrome samples.
- Latent representations aligned with known biological patterns like autoimmune conditions and sleep apnea.

## Abstract

Disruption in normal gene expression can contribute to the development of diseases and chronic conditions. However, identifying disease-specific gene signatures can be challenging due to the presence of multiple co-occurring conditions and limited sample sizes. Unsupervised representation learning methods, such as matrix decomposition and deep learning, simplify high-dimensional data into understandable patterns, but often do not provide clear biological explanations. Incorporating prior biological knowledge directly can enhance understanding and address small sample sizes. Nevertheless, current models do not jointly consider prior knowledge of molecular interactions and sample labels.

We present GRACKLE, a novel nonnegative matrix factorization approach that applies Graph Regularization Across Contextual KnowLedgE. GRACKLE integrates sample similarity and gene similarity matrices based on sample metadata and molecular relationships, respectively. Simulation studies show GRACKLE outperformed other NMF algorithms, especially with increased background noise. GRACKLE effectively stratified breast tumor samples and identified condition-enriched subgroups in individuals with Down syndrome. The model's latent representations aligned with known biological patterns, such as autoimmune conditions and sleep apnea in Down syndrome. GRACKLE's flexibility allows application to various data modalities, offering a robust solution for identifying context-specific molecular mechanisms in biomedical research.

GRACKLE is available at: https://github.com/lagillenwater/GRACKLE.

## Linked entities

- **Diseases:** breast cancer (MONDO:0004989), Down syndrome (MONDO:0008608), sleep apnea (MONDO:0005296)

## Full-text entities

- **Diseases:** sleep apnea (MESH:D012891), breast tumor (MESH:D001943), Down syndrome (MESH:D004314), autoimmune conditions (MESH:D001327)

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12261436/full.md

---
Source: https://tomesphere.com/paper/PMC12261436