Text2Cypher: Bridging Natural Language and Graph Databases
Makbule Gulcin Ozsoy, Leila Messallem, Jon Besga, Gianandrea Minneci

TL;DR
This paper presents Text2Cypher, a system that translates natural language queries into Cypher for knowledge graphs, leveraging fine-tuned large language models trained on a new, high-quality dataset to improve accuracy and usability for non-experts.
Contribution
The work introduces a large, curated dataset of 44,387 instances for fine-tuning LLMs to translate natural language into Cypher queries, enhancing performance over baseline models.
Findings
Fine-tuned models outperform baselines in BLEU and Exact Match scores.
A high-quality dataset significantly improves translation accuracy.
Effective dataset preparation is crucial for LLM performance in domain-specific tasks.
Abstract
Knowledge graphs use nodes, relationships, and properties to represent arbitrarily complex data. When stored in a graph database, the Cypher query language enables efficient modeling and querying of knowledge graphs. However, using Cypher requires specialized knowledge, which can present a challenge for non-expert users. Our work Text2Cypher aims to bridge this gap by translating natural language queries into Cypher query language and extending the utility of knowledge graphs to non-technical expert users. While large language models (LLMs) can be used for this purpose, they often struggle to capture complex nuances, resulting in incomplete or incorrect outputs. Fine-tuning LLMs on domain-specific datasets has proven to be a more promising approach, but the limited availability of high-quality, publicly available Text2Cypher datasets makes this challenging. In this work, we show how…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
