HyperEvoGen: Exploring deep phylogeny using non-Euclidean variational inference
Jason Lamanna, Erfan Mowlaei, Xinghua Shi, Sudhir Kumar, Vincenzo Carnevale

TL;DR
HyperEvoGen introduces a hyperbolic variational autoencoder with adversarial training for improved modeling of protein evolution, capturing long-range co-evolutionary patterns and phylogenetic relationships more accurately.
Contribution
It presents a novel hyperbolic latent space model that better preserves evolutionary distances and hierarchical structures in protein sequence data.
Findings
Outperforms traditional methods in ancestral sequence reconstruction.
Provides higher-quality sequence generation with less training time.
Accurately models deep evolutionary divergence in benchmarks.
Abstract
Homologous proteins evolve from a common ancestral sequence, constrained by intricate patterns of co-evolving residues. Accurate reconstruction of evolutionary histories remains a challenge, primarily due to the inability of the existing approaches to capture long-range coevolutionary ties and lack of a precise metric to represent the evolutionary distance between sequences. Standard approaches are based on p-distance or substitution-corrected measures such as Jukes-Cantor. These methods saturate in cases of deep evolutionary divergence, losing all evolutionary signal after enough time. We present HyperEvoGen, a Poincar\'e variational autoencoder with adversarial training, hyperbolic latent geometry, and a compound loss function that learns evolutionarily meaningful representations from single-family alignments. The arrangement of protein sequences in HyperEvoGen's hyperbolic embedding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
