ProDOMA: improve PROtein DOMAin classification for third-generation   sequencing reads using deep learning

Du Nan; Jiayu Shang; Yanni Sun

arXiv:2009.12591·q-bio.GN·July 9, 2021

ProDOMA: improve PROtein DOMAin classification for third-generation sequencing reads using deep learning

Du Nan, Jiayu Shang, Yanni Sun

PDF

1 Repo

TL;DR

ProDOMA is a deep learning tool designed to accurately classify protein domains directly from noisy third-generation sequencing reads, outperforming existing methods without requiring error correction.

Contribution

It introduces a novel deep neural network model that handles high-error long reads and incorporates an open-set approach for improved protein domain classification.

Findings

01

ProDOMA outperforms HMMER and DeepFam in accuracy.

02

It effectively classifies noisy long reads without error correction.

03

The model can reject noncoding or unrelated reads.

Abstract

Motivation: With the development of third-generation sequencing technologies, people are able to obtain DNA sequences with lengths from 10s to 100s of kb. These long reads allow protein domain annotation without assembly, thus can produce important insights into the biological functions of the underlying data. However, the high error rate in third-generation sequencing data raises a new challenge to established domain analysis pipelines. The state-of-the-art methods are not optimized for noisy reads and have shown unsatisfactory accuracy of domain classification in third-generation sequencing data. New computational methods are still needed to improve the performance of domain prediction in long noisy reads. Results: In this work, we introduce ProDOMA, a deep learning model that conducts domain classification for third-generation sequencing reads. It uses deep neural networks with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

strideradu/ProDOMA
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.