GeneAgent: Self-verification Language Agent for Gene Set Knowledge Discovery using Domain Databases
Zhizheng Wang, Qiao Jin, Chih-Hsuan Wei, Shubo Tian, Po-Ting Lai,, Qingqing Zhu, Chi-Ping Day, Christina Ross, Zhiyong Lu

TL;DR
GeneAgent is a novel language agent that autonomously interacts with biological databases to improve gene set knowledge discovery, significantly reducing hallucinations and providing more reliable insights compared to standard LLMs.
Contribution
It introduces a self-verification language agent for gene set analysis that outperforms GPT-4 and minimizes hallucinations through domain knowledge integration.
Findings
GeneAgent outperforms GPT-4 on benchmark gene sets.
Self-verification reduces hallucinations in gene analysis.
Application to mouse melanoma gene sets yields novel insights.
Abstract
Gene set knowledge discovery is essential for advancing human functional genomics. Recent studies have shown promising performance by harnessing the power of Large Language Models (LLMs) on this task. Nonetheless, their results are subject to several limitations common in LLMs such as hallucinations. In response, we present GeneAgent, a first-of-its-kind language agent featuring self-verification capability. It autonomously interacts with various biological databases and leverages relevant domain knowledge to improve accuracy and reduce hallucination occurrences. Benchmarking on 1,106 gene sets from different sources, GeneAgent consistently outperforms standard GPT-4 by a significant margin. Moreover, a detailed manual review confirms the effectiveness of the self-verification module in minimizing hallucinations and generating more reliable analytical narratives. To demonstrate its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Bioinformatics and Genomic Networks · Computational Drug Discovery Methods
MethodsSparse Evolutionary Training · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Attention Is All You Need · Residual Connection · Position-Wise Feed-Forward Layer · Multi-Head Attention · Dropout
