FoldExplorer: Fast and Accurate Protein Structure Search with   Sequence-Enhanced Graph Embedding

Yuan Liu; Hong-Bin Shen

arXiv:2311.18219·q-bio.BM·December 1, 2023·2 cites

FoldExplorer: Fast and Accurate Protein Structure Search with Sequence-Enhanced Graph Embedding

Yuan Liu, Hong-Bin Shen

PDF

Open Access

TL;DR

FoldExplorer is a deep learning-based tool that combines graph attention networks and protein language models to enable fast, accurate, and scalable protein structure searches, significantly outperforming existing methods.

Contribution

It introduces a novel graph attention neural network and language model-based embedding approach for protein structures, enhancing search speed and accuracy.

Findings

01

Achieves 5-8% performance improvement over state-of-the-art methods.

02

Maintains high speed on large-scale datasets.

03

Provides meaningful insights into protein space.

Abstract

The advent of highly accurate protein structure prediction methods has fueled an exponential expansion of the protein structure database. Consequently, there is a rising demand for rapid and precise structural homolog search. Traditional alignment-based methods are dedicated to precise comparisons between pairs, exhibiting high accuracy. However, their sluggish processing speed is no longer adequate for managing the current massive volume of data. In response to this challenge, we propose a novel deep-learning approach FoldExplorer. It harnesses the powerful capabilities of graph attention neural networks and protein large language models for protein structures and sequences data processing to generate embeddings for protein structures. The structural embeddings can be used for fast and accurate protein search. The embeddings also provide insights into the protein space. FoldExplorer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Bioinformatics · Bioinformatics and Genomic Networks · Genomics and Phylogenetic Studies

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings