Leveraging Sequence Embedding and Convolutional Neural Network for Protein Function Prediction
Wei-Cheng Tseng, Po-Han Chi, Jia-Hua Wu, Min Sun

TL;DR
This paper introduces a novel approach combining unsupervised sequence embedding and convolutional neural networks to predict protein functions accurately using only sequence data, addressing data scarcity and large label space challenges.
Contribution
It proposes a method that outperforms existing techniques by avoiding the need for additional bio-information and handling rare protein functions effectively.
Findings
Significantly outperforms existing methods on benchmark datasets.
Uses only protein sequences without requiring 3D structural data.
Effectively manages large label space and rare functions.
Abstract
The capability of accurate prediction of protein functions and properties is essential in the biotechnology industry, e.g. drug development and artificial protein synthesis, etc. The main challenges of protein function prediction are the large label space and the lack of labeled training data. Our method leverages unsupervised sequence embedding and the success of deep convolutional neural network to overcome these challenges. In contrast, most of the existing methods delete the rare protein functions to reduce the label space. Furthermore, some existing methods require additional bio-information (e.g., the 3-dimensional structure of the proteins) which is difficult to be determined in biochemical experiments. Our proposed method significantly outperforms the other methods on the publicly available benchmark using only protein sequences as input. This allows the process of identifying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Protein Structure and Dynamics · Machine Learning in Materials Science
