Text2Cypher: Bridging Natural Language and Graph Databases

Makbule Gulcin Ozsoy; Leila Messallem; Jon Besga; Gianandrea Minneci

arXiv:2412.10064·cs.LG·December 16, 2024

Text2Cypher: Bridging Natural Language and Graph Databases

Makbule Gulcin Ozsoy, Leila Messallem, Jon Besga, Gianandrea Minneci

PDF

TL;DR

This paper presents Text2Cypher, a system that translates natural language queries into Cypher for knowledge graphs, leveraging fine-tuned large language models trained on a new, high-quality dataset to improve accuracy and usability for non-experts.

Contribution

The work introduces a large, curated dataset of 44,387 instances for fine-tuning LLMs to translate natural language into Cypher queries, enhancing performance over baseline models.

Findings

01

Fine-tuned models outperform baselines in BLEU and Exact Match scores.

02

A high-quality dataset significantly improves translation accuracy.

03

Effective dataset preparation is crucial for LLM performance in domain-specific tasks.

Abstract

Knowledge graphs use nodes, relationships, and properties to represent arbitrarily complex data. When stored in a graph database, the Cypher query language enables efficient modeling and querying of knowledge graphs. However, using Cypher requires specialized knowledge, which can present a challenge for non-expert users. Our work Text2Cypher aims to bridge this gap by translating natural language queries into Cypher query language and extending the utility of knowledge graphs to non-technical expert users. While large language models (LLMs) can be used for this purpose, they often struggle to capture complex nuances, resulting in incomplete or incorrect outputs. Fine-tuning LLMs on domain-specific datasets has proven to be a more promising approach, but the limited availability of high-quality, publicly available Text2Cypher datasets makes this challenging. In this work, we show how…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.