HyEm: Query-Adaptive Hyperbolic Retrieval for Biomedical Ontologies via Euclidean Vector Indexing
Ou Deng, Shoji Nishimura, Atsushi Ogihara, Qun Jin

TL;DR
HyEm introduces a hybrid retrieval method combining hyperbolic and Euclidean embeddings to improve hierarchy-aware biomedical ontology retrieval while maintaining compatibility with existing vector databases.
Contribution
HyEm provides a novel lightweight retrieval layer that integrates hyperbolic embeddings into Euclidean ANN systems with query-adaptive mixing, enhancing hierarchy-aware retrieval.
Findings
HyEm preserves 94-98% of Euclidean baseline performance on entity-centric queries.
HyEm significantly improves hierarchy navigation and mixed-intent query handling.
HyEm maintains indexability at moderate oversampling levels.
Abstract
Retrieval-augmented generation (RAG) for biomedical knowledge faces a hierarchy-aware ontology grounding challenge: resources like HPO, DO, and MeSH use deep ``is-a" taxonomies, yet production stacks rely on Euclidean embeddings and ANN indexes. While hyperbolic embeddings suit hierarchical representation, they face two barriers: (i) lack of native vector database support, and (ii) risk of underperforming on entity-centric queries where hierarchy is irrelevant. We present HyEm, a lightweight retrieval layer integrating hyperbolic ontology embeddings into existing Euclidean ANN infrastructure. HyEm learns radius-controlled hyperbolic embeddings, stores origin log-mapped vectors in standard Euclidean databases for candidate retrieval, then applies exact hyperbolic reranking. A query-adaptive gate outputs continuous mixing weights, combining Euclidean semantic similarity with hyperbolic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
