LLM-Driven Data Generation and a Novel Soft Metric for Evaluating Text-to-SQL in Aviation MRO
Patrick Sutanto, Jonathan Kenrick, Max Lorenz, Joan Santoso

TL;DR
This paper introduces a soft F1-score metric for better evaluation of text-to-SQL models and an LLM-based data synthesis pipeline to create domain-specific benchmarks in aviation MRO, enhancing assessment and development.
Contribution
It presents a novel soft metric for nuanced evaluation and an LLM-driven data generation method tailored for aviation MRO, addressing existing evaluation and data scarcity challenges.
Findings
The soft metric offers more detailed insights than traditional accuracy.
The data synthesis pipeline effectively creates realistic question-SQL pairs.
Experiments validate improved evaluation and benchmarking in aviation MRO.
Abstract
The application of Large Language Models (LLMs) to text-to-SQL tasks promises to democratize data access, particularly in critical industries like aviation Maintenance, Repair, and Operation (MRO). However, progress is hindered by two key challenges: the rigidity of conventional evaluation metrics such as execution accuracy, which offer coarse, binary feedback, and the scarcity of domain-specific evaluation datasets. This paper addresses these gaps. To enable more nuanced assessment, we introduce a novel F1-score-based 'soft' metric that quantifies the informational overlap between generated and ground-truth SQL results. To address data scarcity, we propose an LLM-driven pipeline that synthesizes realistic question-SQL pairs from database schemas. We demonstrate our contributions through an empirical evaluation on an authentic MRO database. Our experiments show that the proposed soft…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Computational Techniques and Applications
