Towards a robust out-of-the-box neural network model for genomic data
Zhaoyi Zhang, Songyang Cheng, Claudia Solis-Lemus

TL;DR
This paper evaluates the robustness and transferability of neural network models for genomic data, highlighting recurrent neural networks' superior performance and identifying baseline characteristics for future model development.
Contribution
It provides an analysis of neural network robustness in genomics, emphasizing recurrent models and proposing baseline features for out-of-the-box applications.
Findings
Recurrent neural networks outperform convolutional models in accuracy and transferability.
Neural networks face challenges due to biological data heterogeneity and modest sample sizes.
Certain model characteristics are identified as transferable across datasets.
Abstract
The accurate prediction of biological features from genomic data is paramount for precision medicine and sustainable agriculture. For decades, neural network models have been widely popular in fields like computer vision, astrophysics and targeted marketing given their prediction accuracy and their robust performance under big data settings. Yet neural network models have not made a successful transition into the medical and biological world due to the ubiquitous characteristics of biological data such as modest sample sizes, sparsity, and extreme heterogeneity. Here, we investigate the robustness, generalization potential and prediction accuracy of widely used convolutional neural network and natural language processing models with a variety of heterogeneous genomic datasets. Mainly, recurrent neural network models outperform convolutional neural network models in terms of prediction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetics, Bioinformatics, and Biomedical Research · Machine Learning in Bioinformatics · Gene expression and cancer classification
