ProtVec: A Continuous Distributed Representation of Biological Sequences

Ehsaneddin Asgari; Mohammad R.K. Mofrad

arXiv:1503.05140·q-bio.QM·May 30, 2016

ProtVec: A Continuous Distributed Representation of Biological Sequences

Ehsaneddin Asgari, Mohammad R.K. Mofrad

PDF

1 Repo

TL;DR

This paper introduces ProtVec, a neural network-based continuous vector representation for protein sequences, enabling improved classification, structure prediction, and disordered protein identification in bioinformatics.

Contribution

It presents a novel neural network approach to generate dense vector representations of proteins, enhancing various bioinformatics tasks over existing methods.

Findings

01

Achieved 93% accuracy in protein family classification.

02

Distinguished disordered from structured proteins with up to 100% accuracy.

03

Outperformed existing family classification methods.

Abstract

We introduce a new representation and feature extraction method for biological sequences. Named bio-vectors (BioVec) to refer to biological sequences in general with protein-vectors (ProtVec) for proteins (amino-acid sequences) and gene-vectors (GeneVec) for gene sequences, this representation can be widely used in applications of deep learning in proteomics and genomics. In the present paper, we focus on protein-vectors that can be utilized in a wide array of bioinformatics investigations such as family classification, protein visualization, structure prediction, disordered protein identification, and protein-protein interaction prediction. In this method, we adopt artificial neural network approaches and represent a protein sequence with a single dense n-dimensional vector. To evaluate this method, we apply it in classification of 324,018 protein sequences obtained from Swiss-Prot…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ehsanasgari/Deep-Proteomics
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.