Neural Distance Embeddings for Biological Sequences
Gabriele Corso, Rex Ying, Michal P\'andy, Petar Veli\v{c}kovi\'c, Jure, Leskovec, Pietro Li\`o

TL;DR
NeuroSEED introduces a novel hyperbolic embedding framework for biological sequences, effectively capturing hierarchical relationships and improving accuracy and efficiency over existing methods.
Contribution
The paper presents NeuroSEED, a new geometric embedding framework that models biological sequence evolution more accurately using hyperbolic space, outperforming traditional Euclidean approaches.
Findings
22% average reduction in embedding RMSE with hyperbolic space
Significant accuracy improvements on bioinformatics tasks
Up to 30x faster hierarchical clustering methods
Abstract
The development of data-dependent heuristics and representations for biological sequences that reflect their evolutionary distance is critical for large-scale biological research. However, popular machine learning approaches, based on continuous Euclidean spaces, have struggled with the discrete combinatorial formulation of the edit distance that models evolution and the hierarchical relationship that characterises real-world datasets. We present Neural Distance Embeddings (NeuroSEED), a general framework to embed sequences in geometric vector spaces, and illustrate the effectiveness of the hyperbolic space that captures the hierarchical structure and provides an average 22% reduction in embedding RMSE against the best competing geometry. The capacity of the framework and the significance of these improvements are then demonstrated devising supervised and unsupervised NeuroSEED…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
