Neural network facilitated ab initio derivation of linear formula: A case study on formulating the relationship between DNA motifs and gene expression
Chengyu Liu, Wei Wang

TL;DR
This paper introduces an interpretable neural network framework for deriving linear formulas that relate DNA motifs to gene expression, achieving comparable predictive performance to deep models and uncovering biologically significant motifs.
Contribution
It presents a novel approach using contextual regression for ab initio motif discovery and formula derivation, enhancing interpretability in biological modeling.
Findings
Identified 300 motifs with regulatory roles in gene expression.
Predicted gene expression levels with performance similar to deep neural networks.
Demonstrated the biological relevance of motifs across 154 cell types.
Abstract
Developing models with high interpretability and even deriving formulas to quantify relationships between biological data is an emerging need. We propose here a framework for ab initio derivation of sequence motifs and linear formula using a new approach based on the interpretable neural network model called contextual regression model. We showed that this linear model could predict gene expression levels using promoter sequences with a performance comparable to deep neural network models. We uncovered a list of 300 motifs with important regulatory roles on gene expression and showed that they also had significant contributions to cell-type specific gene expression in 154 diverse cell types. This work illustrates the possibility of deriving formulas to represent biology laws that may not be easily elucidated. (https://github.com/Wang-lab-UCSD/Motif_Finding_Contextual_Regression)
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolutionary Algorithms and Applications · Gene Regulatory Network Analysis · Machine Learning in Bioinformatics
