Prototype Matching Networks for Large-Scale Multi-label Genomic Sequence Classification
Jack Lanchantin, Arshdeep Sekhon, Ritambhara Singh, Yanjun Qi

TL;DR
This paper introduces Prototype Matching Networks, a novel deep learning architecture that models TF binding mechanisms and interactions for large-scale multi-label genomic sequence classification, significantly improving prediction accuracy.
Contribution
The paper presents the first deep learning model combining prototype learning and TF-TF interaction modeling for large-scale TFBS prediction.
Findings
Significantly outperforms baseline models on a dataset with 2.1 million sequences.
Effectively models biological TF binding mechanisms and interactions.
Demonstrates the importance of prototype learning in genomics.
Abstract
One of the fundamental tasks in understanding genomics is the problem of predicting Transcription Factor Binding Sites (TFBSs). With more than hundreds of Transcription Factors (TFs) as labels, genomic-sequence based TFBS prediction is a challenging multi-label classification task. There are two major biological mechanisms for TF binding: (1) sequence-specific binding patterns on genomes known as "motifs" and (2) interactions among TFs known as co-binding effects. In this paper, we propose a novel deep architecture, the Prototype Matching Network (PMN) to mimic the TF binding mechanisms. Our PMN model automatically extracts prototypes ("motif"-like features) for each TF through a novel prototype-matching loss. Borrowing ideas from few-shot matching models, we use the notion of support set of prototypes and an LSTM to learn how TFs interact and bind to genomic sequences. On a reference…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Machine Learning in Bioinformatics · RNA and protein synthesis mechanisms
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
