Key Gene Mining in Transcriptional Regulation for Specific Biological Processes with Small Sample Sizes Using Multi-network pipeline Transformer
Kerui Huang, Jianhong Tian, Lei Sun, Li Zeng, Peng Xie, Aihua Deng,, Ping Mo, Zhibo Zhou, Ming Jiang, Yun Wang, Xiaocheng Jiang

TL;DR
This paper introduces TransGeneSelector, a deep learning pipeline that effectively mines key regulatory genes from small transcriptome datasets, outperforming traditional methods and revealing genes crucial for seed germination.
Contribution
The study presents a novel multi-network deep learning approach combining data augmentation, filtering, and Transformer classification for gene mining in small samples.
Findings
Successfully classified seed states with performance comparable to Random Forests.
Identified upstream regulatory genes involved in seed germination.
Validated key genes' roles through KEGG analysis and RT-qPCR.
Abstract
Gene mining is an important topic in the field of life sciences, but traditional machine learning methods cannot consider the regulatory relationships between genes. Deep learning methods perform poorly in small sample sizes. This study proposed a deep learning method, called TransGeneSelector, that can mine critical regulatory genes involved in certain life processes using a small-sample transcriptome dataset. The method combines a WGAN-GP data augmentation network, a sample filtering network, and a Transformer classifier network, which successfully classified the state (germinating or dry seeds) of Arabidopsis thaliana seed in a dataset of 79 samples, showing performance comparable to that of Random Forests. Further, through the use of SHapley Additive exPlanations method, TransGeneSelector successfully mined genes involved in seed germination. Through the construction of gene…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Bioinformatics and Genomic Networks · Machine Learning in Bioinformatics
MethodsMulti-Head Attention · Attention Is All You Need · Label Smoothing · Linear Layer · Adam · Dense Connections · Residual Connection · Dropout · Absolute Position Encodings · Byte Pair Encoding
