# Integrative Factorization of Bidimensionally Linked Matrices

**Authors:** Jun Young Park, Eric F. Lock

arXiv: 1906.03722 · 2020-02-10

## TL;DR

BIDIFAC is a new method for integrating and analyzing bidimensionally linked biomedical data matrices, capturing shared and unique variability across multiple cohorts and platforms.

## Contribution

It introduces a novel bidimensional factorization approach that extends nuclear norm penalization for complex multi-cohort, multi-platform data integration.

## Key findings

- Successfully applied to breast cancer genomics data from TCGA
- Effectively separates shared and unique data patterns
- Provides R code for implementation and simulation

## Abstract

Advances in molecular "omics'" technologies have motivated new methodology for the integration of multiple sources of high-content biomedical data. However, most statistical methods for integrating multiple data matrices only consider data shared vertically (one cohort on multiple platforms) or horizontally (different cohorts on a single platform). This is limiting for data that take the form of bidimensionally linked matrices (e.g., multiple cohorts measured on multiple platforms), which are increasingly common in large-scale biomedical studies. In this paper, we propose BIDIFAC (Bidimensional Integrative Factorization) for integrative dimension reduction and signal approximation of bidimensionally linked data matrices. Our method factorizes the data into (i) globally shared, (ii) row-shared, (iii) column-shared, and (iv) single-matrix structural components, facilitating the investigation of shared and unique patterns of variability. For estimation we use a penalized objective function that extends the nuclear norm penalization for a single matrix. As an alternative to the complicated rank selection problem, we use results from random matrix theory to choose tuning parameters. We apply our method to integrate two genomics platforms (mRNA and miRNA expression) across two sample cohorts (tumor samples and normal tissue samples) using the breast cancer data from TCGA. We provide R code for fitting BIDIFAC, imputing missing values, and generating simulated data.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.03722/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1906.03722/full.md

## References

34 references — full list in the complete paper: https://tomesphere.com/paper/1906.03722/full.md

---
Source: https://tomesphere.com/paper/1906.03722