A novel high-dimensional model for identifying regional DNA methylation QTLs
Kaiqiong Zhao, Archer Y Yang, Karim Oualkacha, Yixiao Zeng, Kathleen Klein, Marie Hudson, Inés Colmegna, Sasha Bernatsky, Celia M T Greenwood

TL;DR
This paper introduces a new statistical model to identify genetic variants that influence DNA methylation in specific regions.
Contribution
The novel composite sparse penalty and efficient algorithm improve variable selection and smoothness in high-dimensional data.
Findings
The proposed method outperforms sparsity-only approaches in estimation and prediction accuracy.
Including smoothness control significantly enhances the identification of regional methylation QTLs.
The method was successfully applied to asymptomatic samples from the CARTaGENE cohort.
Abstract
Varying coefficient models offer the flexibility to learn the dynamic changes of regression coefficients. Despite their good interpretability and diverse applications, in high-dimensional settings, existing estimation methods for such models have important limitations. For example, we routinely encounter the need for variable selection when faced with a large collection of covariates with nonlinear/varying effects on outcomes, and no ideal solutions exist. One illustration of this situation could be identifying a subset of genetic variants with local influence on methylation levels in a regulatory region. To address this problem, we propose a composite sparse penalty that encourages both sparsity and smoothness for the varying coefficients. We present an efficient proximal gradient descent algorithm that scales to high-dimensional predictor spaces, providing sparse solutions for the…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Genetic Associations and Epidemiology · Gene expression and cancer classification
