Auto-Cypher: Improving LLMs on Cypher generation via LLM-supervised generation-verification framework
Aman Tiwari, Shiva Krishna Reddy Malay, Vikas Yadav, Masoud Hashemi,, Sathwik Tejaswi Madhusudhan

TL;DR
This paper introduces Auto-Cypher, a framework that uses LLM supervision to generate high-quality Cypher query data, significantly improving LLM performance on Cypher generation tasks for graph databases.
Contribution
The paper presents a novel LLM-supervised pipeline for synthetic Cypher data generation, enhancing the training of LLMs for graph database query translation.
Findings
Generated 29.8k high-quality Cypher instances across domains
Training LLMs on SynthCypher improves performance by up to 40%
Achieved 30% performance gains on the SPIDER benchmark
Abstract
Graph databases like Neo4j are gaining popularity for handling complex, interconnected data, over traditional relational databases in modeling and querying relationships. While translating natural language into SQL queries is well-researched, generating Cypher queries for Neo4j remains relatively underexplored. In this work, we present an automated, LLM-Supervised, pipeline to generate high-quality synthetic data for Text2Cypher. Our Cypher data generation pipeline introduces LLM-As-Database-Filler, a novel strategy for ensuring Cypher query correctness, thus resulting in high quality generations. Using our pipeline, we generate high quality Text2Cypher data - SynthCypher containing 29.8k instances across various domains and queries with varying complexities. Training open-source LLMs like LLaMa-3.1-8B, Mistral-7B, and QWEN-7B on SynthCypher results in performance gains of up to 40% on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Graph Theory and Algorithms · Cognitive Computing and Networks
MethodsSparse Evolutionary Training
