Aligning Large Language Models to a Domain-specific Graph Database for   NL2GQL

Yuanyuan Liang; Keren Tan; Tingyu Xie; Wenbiao Tao; Siyuan Wang,; Yunshi Lan; Weining Qian

arXiv:2402.16567·cs.CL·September 6, 2024·1 cites

Aligning Large Language Models to a Domain-specific Graph Database for NL2GQL

Yuanyuan Liang, Keren Tan, Tingyu Xie, Wenbiao Tao, Siyuan Wang,, Yunshi Lan, Weining Qian

PDF

Open Access

TL;DR

This paper presents a pipeline that uses ChatGPT to generate domain-specific NL-GQL data pairs, fine-tunes LLMs for NL2GQL tasks, and emphasizes the importance of schema relevance, achieving significant performance improvements in finance and medicine domains.

Contribution

The paper introduces a novel method for generating domain-specific NL-GQL data using ChatGPT and schema extraction, enhancing LLM alignment for NL2GQL tasks.

Findings

01

Significant improvements in EM and EX metrics over baselines.

02

Effective use of schema relevance for accurate GQL generation.

03

Successful application in finance and medicine domains.

Abstract

Graph Databases (Graph DB) find extensive application across diverse domains such as finance, social networks, and medicine. Yet, the translation of Natural Language (NL) into the Graph Query Language (GQL), referred to as NL2GQL, poses significant challenges owing to its intricate and specialized nature. Some approaches have sought to utilize Large Language Models (LLMs) to address analogous tasks like text2SQL. Nonetheless, in the realm of NL2GQL tasks tailored to a particular domain, the absence of domain-specific NL-GQL data pairs adds complexity to aligning LLMs with the graph DB. To tackle this challenge, we present a well-defined pipeline. Initially, we utilize ChatGPT to generate NL-GQL data pairs, leveraging the provided graph DB with self-instruction. Subsequently, we employ the generated data to fine-tune LLMs, ensuring alignment between LLMs and the graph DB. Moreover, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Graph Neural Networks · Natural Language Processing Techniques

MethodsSparse Evolutionary Training