# SOLVE: A structured orthogonal latent variable framework for disentangling confounding in matrix data

**Authors:** Jialai She, Gil Alterovitz

PMC · DOI: 10.1093/biomethods/bpaf094 · Biology Methods & Protocols · 2026-01-28

## TL;DR

This paper introduces a new method to separate known and hidden factors in matrix data, improving the accuracy of gene-drug associations in pharmacogenomics.

## Contribution

A novel structured orthogonal latent variable framework that disentangles confounding in matrix data with improved identifiability and interpretability.

## Key findings

- The method recovers biologically coherent gene-drug associations missed by standard models, such as the EGFR-inhibitor link.
- It identifies novel gene programs aligned with drug mechanisms, including a latent unfolded-protein-response module affecting drug sensitivity.
- The framework provides valid inference on feature-outcome associations and improves biomarker discovery for precision oncology.

## Abstract

Latent factor models are valuable in bioinformatics for accounting for unmeasured variation alongside observed covariates. Yet many methods struggle to separate known effects from latent structure and to handle losses beyond standard regression. We present a unified framework that augments row and column predictors with a low-rank latent component, jointly modeling measured effects and residual variation. To remove ambiguity in estimating observed and latent effects, we impose a carefully designed set of orthogonality constraints on the coefficient and latent factor matrices, relative to the spans of the predictor matrices. These constraints ensure identifiability, yield a decomposition in which the latent term captures only variation unexplained by the covariates, and improve interpretability. An efficient algorithm handles general non-quadratic losses via surrogates with monotone descent. Each iteration updates the latent term by truncated singular value decomposition of a doubly projected residual and refines coefficients by projections. The number of latent factors is selected by applying an elbow rule to a degrees-of-freedom-adjusted information criterion. A parametric bootstrap provides valid inference on feature-outcome associations under the regularized low-rank structure. Applied to real pharmacogenomic data, the method recovers biologically coherent gene-drug associations missed by standard factor models, such as the EGFR-inhibitor link, highlights novel candidates with plausible mechanisms, and reveals gene programs aligned with compound modes of action, including a latent unfolded-protein-response module affecting drug sensitivity. These results support the framework’s utility for precision oncology, yielding stronger biomarkers for patient stratification and deeper insight into drug resistance mechanisms.

## Linked entities

- **Genes:** EGFR (epidermal growth factor receptor) [NCBI Gene 1956]

## Full-text entities

- **Genes:** EGFR (epidermal growth factor receptor) [NCBI Gene 1956] {aka ERBB, ERBB1, ERRP, HER1, NISBD2, NNCIS}
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12848822/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12848822/full.md

## References

39 references — full list in the complete paper: https://tomesphere.com/paper/PMC12848822/full.md

---
Source: https://tomesphere.com/paper/PMC12848822