LLM-based SPARQL Query Generation from Natural Language over Federated Knowledge Graphs
Vincent Emonet, Jerven Bolleman, Severine Duvaud, Tarcisio Mendes de, Farias, Ana Claudia Sima

TL;DR
This paper presents a Retrieval-Augmented Generation system that uses Large Language Models and KG metadata to accurately translate natural language questions into federated SPARQL queries, with validation to improve correctness.
Contribution
It introduces a novel RAG-based approach combining LLMs and KG metadata for precise federated query generation with validation, enhancing accuracy over existing methods.
Findings
Improved accuracy in federated SPARQL query generation
Reduced hallucinations through validation step
System availability online at chat.expasy.org
Abstract
We introduce a Retrieval-Augmented Generation (RAG) system for translating user questions into accurate federated SPARQL queries over bioinformatics knowledge graphs (KGs) leveraging Large Language Models (LLMs). To enhance accuracy and reduce hallucinations in query generation, our system utilises metadata from the KGs, including query examples and schema information, and incorporates a validation step to correct generated queries. The system is available online at chat.expasy.org.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies
