Measuring the severity of multi-collinearity in high dimensions
Wei Q. Deng, Radu V. Craiu, Lei Sun

TL;DR
This paper introduces new measures to quantify and visualize multi-collinearity in high-dimensional data, addressing limitations of classic tools and enabling better analysis in genomic studies.
Contribution
It proposes individualized and global measures for assessing multi-collinearity in high-dimensional settings, applicable to complex genomic data.
Findings
Measures effectively distinguish genomic regions with high multi-collinearity.
They reveal differences in multi-collinearity levels across populations.
Tools facilitate visualization and interpretation of multi-collinearity patterns.
Abstract
Multi-collinearity is a wide-spread phenomenon in modern statistical applications and when ignored, can negatively impact model selection and statistical inference. Classic tools and measures that were developed for "" data are not applicable nor interpretable in the high-dimensional regime. Here we propose 1) new individualized measures that can be used to visualize patterns of multi-collinearity, and subsequently 2) global measures to assess the overall burden of multi-collinearity without limiting the observed data dimensions. We applied these measures to genomic applications to investigate patterns of multi-collinearity in genetic variations across individuals with diverse ancestral backgrounds. The measures were able to visually distinguish genomic regions of excessive multi-collinearity and contrast the level of multi-collinearity between different continental populations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBioinformatics and Genomic Networks · Computational Drug Discovery Methods · Gene expression and cancer classification
