GATHER: Convergence-Centric Hyper-Entity Retrieval for Zero-Shot Cell-Type Annotation
Zhonghui Zhang, Feng Jiang, Shaowei Qin, Jiahao Zhao, Min Yang

TL;DR
GATHER is a novel retrieval method that identifies convergence points in a biological knowledge graph to improve zero-shot cell-type annotation from multiple genes, reducing LLM calls and enhancing accuracy.
Contribution
It introduces a convergence-centric graph traversal approach that captures entity synergy through topological convergence points, outperforming existing methods in cell-type annotation.
Findings
GATHER achieves higher accuracy with fewer LLM calls compared to baselines.
Convergence nodes effectively summarize multi-entity signals in biological data.
The method demonstrates scalability and efficiency in hyper-entity retrieval.
Abstract
Zero-shot single-cell cell-type annotation aims to determine a cell's type from a given set of expressed genes without any training. Existing knowledge-graph-based RAG approaches retrieve evidence by expanding from source entities and relying on iterative LLM reasoning. However, in this setting each query contains tens to hundreds of genes, where no single gene is decisive and the label emerges only from their collective co-occurrence. Such hyper-entity queries fundamentally challenge local, entity-wise exploration strategies, which reason from individual genes, leading to poor scalability and substantial LLM cost. We propose GATHER (Graph-Aware Traversal with Hyper-Entity Retrieval), a convergence-centric retriever tailored to hyper-entity queries. It performs global multi-source graph traversal and identifies topological convergence points -- nodes jointly reachable from many input…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
