Deep Learning for Metagenomic Data: using 2D Embeddings and Convolutional Neural Networks
Thanh Hai Nguyen, Yann Chevaleyre, Edi Prifti, Nataliya Sokolovska and, Jean-Daniel Zucker

TL;DR
This paper introduces a novel method to apply convolutional neural networks to metagenomic data by transforming it into 2D images, enabling disease prediction with promising results across multiple datasets.
Contribution
The paper demonstrates how to meaningfully map metagenomic data into 2D images for CNN application, addressing the challenge of non-image structured data in bioinformatics.
Findings
Effective disease prediction across six datasets
Successful mapping of metagenomic data to 2D images
Potential of CNNs for bioinformatics prediction tasks
Abstract
Deep learning (DL) techniques have had unprecedented success when applied to images, waveforms, and texts to cite a few. In general, when the sample size (N) is much greater than the number of features (d), DL outperforms previous machine learning (ML) techniques, often through the use of convolution neural networks (CNNs). However, in many bioinformatics ML tasks, we encounter the opposite situation where d is greater than N. In these situations, applying DL techniques (such as feed-forward networks) would lead to severe overfitting. Thus, sparse ML techniques (such as LASSO e.g.) usually yield the best results on these tasks. In this paper, we show how to apply CNNs on data which do not have originally an image structure (in particular on metagenomic data). Our first contribution is to show how to map metagenomic data in a meaningful way to 1D or 2D images. Based on this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Gene expression and cancer classification · AI in cancer detection
MethodsConvolution
