# NEAR: neural embeddings for amino acid relationships

**Authors:** Daniel Olson, Thomas Colligan, Daphne Demekas, Jack W Roddy, Ken Youens-Clark, Travis J Wheeler

PMC · DOI: 10.1093/bioinformatics/btaf198 · 2025-07-15

## TL;DR

NEAR is a new method using neural embeddings to improve the speed and accuracy of finding protein homologs in large databases.

## Contribution

NEAR introduces a neural embedding model that outperforms existing methods in speed and accuracy for homology detection.

## Key findings

- NEAR improves accuracy over state-of-the-art protein language models with lower memory usage and faster speed.
- NEAR is at least 5x faster than HMMER3's pre-filter and outperforms other fast pHMM tools.
- The model is effective as a high-speed pre-filter for sensitive protein annotation.

## Abstract

Protein language models (PLMs) have recently demonstrated potential to supplant classical protein database search methods based on sequence alignment, but are slower than common alignment-based tools and appear to be prone to a high rate of false labeling. Here, we present Neural Embeddings for Amino acid Relationships (NEAR), a method based on neural representation learning that is designed to improve both speed and accuracy of search for likely homologs in a large protein sequence database. NEAR’s ResNet embedding model is trained using contrastive learning guided by trusted sequence alignments. It computes per-residue embeddings for target and query protein sequences, and identifies alignment candidates with a pipeline consisting of residue-level k-NN search and a simple neighbor aggregation scheme. Tests on a benchmark consisting of trusted remote homologs and randomly shuffled decoy sequences reveal that NEAR substantially improves accuracy relative to state-of-the-art PLMs, with lower memory requirements and faster embedding and search speed. While these results suggest that the NEAR model may be useful for standalone homology detection with increased sensitivity over standard alignment-based methods, in this manuscript, we focus on a more straightforward analysis of the model’s value as a high-speed pre-filter for sensitive annotation. In that context, NEAR is at least 5x faster than the pre-filter currently used in the widely used profile hidden Markov model (pHMM) search tool HMMER3, and also outperforms the pre-filter used in our fast pHMM tool, nail.

NEAR is under an open-source license. Code and data curation instructions can be found at https://github.com/TravisWheelerLab/NEAR.

## Full-text entities

- **Chemicals:** Amino acid (MESH:D000596)

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12261438/full.md

---
Source: https://tomesphere.com/paper/PMC12261438