Text2Cypher Across Languages: Evaluating and Finetuning LLMs
Makbule Gulcin Ozsoy, William Tai

TL;DR
This study evaluates and improves multilingual large language models for translating natural language questions into Cypher queries, revealing the impact of training data on cross-lingual performance and the benefits of multilingual finetuning.
Contribution
The paper introduces a multilingual dataset for Text2Cypher, compares foundational and finetuned models across languages, and demonstrates how multilingual finetuning enhances cross-lingual robustness.
Findings
Models perform best in English, worse in Spanish, and least in Turkish.
Prompt translation has minimal impact on evaluation metrics.
Multilingual finetuning reduces performance gaps between languages.
Abstract
Recent advances in large language models (LLMs) have enabled natural language interfaces that translate user questions into database queries, such as Text2SQL, Text2SPARQL, and Text2Cypher. While these interfaces enhance database accessibility, most research today focuses on English, with limited evaluation in other languages. This paper investigates the performance of both foundational and finetuned LLMs on the Text2Cypher task across multiple languages. We create and release a multilingual dataset by translating English questions into Spanish and Turkish while preserving the original Cypher queries, enabling fair cross-lingual comparison. Using standardized prompts and metrics, we evaluate several foundational models and observe a consistent performance pattern: highest on English, followed by Spanish, and lowest on Turkish. We attribute this to differences in training data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsSparse Evolutionary Training
