NEAR: neural embeddings for amino acid relationships
Daniel Olson, Thomas Colligan, Daphne Demekas, Jack W Roddy, Ken Youens-Clark, Travis J Wheeler

TL;DR
NEAR is a new method using neural embeddings to improve the speed and accuracy of finding protein homologs in large databases.
Contribution
NEAR introduces a neural embedding model that outperforms existing methods in speed and accuracy for homology detection.
Findings
NEAR improves accuracy over state-of-the-art protein language models with lower memory usage and faster speed.
NEAR is at least 5x faster than HMMER3's pre-filter and outperforms other fast pHMM tools.
The model is effective as a high-speed pre-filter for sensitive protein annotation.
Abstract
Protein language models (PLMs) have recently demonstrated potential to supplant classical protein database search methods based on sequence alignment, but are slower than common alignment-based tools and appear to be prone to a high rate of false labeling. Here, we present Neural Embeddings for Amino acid Relationships (NEAR), a method based on neural representation learning that is designed to improve both speed and accuracy of search for likely homologs in a large protein sequence database. NEAR’s ResNet embedding model is trained using contrastive learning guided by trusted sequence alignments. It computes per-residue embeddings for target and query protein sequences, and identifies alignment candidates with a pipeline consisting of residue-level k-NN search and a simple neighbor aggregation scheme. Tests on a benchmark consisting of trusted remote homologs and randomly shuffled…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Machine Learning in Bioinformatics · RNA and protein synthesis mechanisms
