Querying Structured Data Through Natural Language Using Language Models

Hontan Valentin-Micu; Bunea Andrei-Alexandru; Tantaroudas Nikolaos Dimitrios; Popovici Dan-Matei

arXiv:2604.03057·cs.CL·April 6, 2026

Querying Structured Data Through Natural Language Using Language Models

Hontan Valentin-Micu, Bunea Andrei-Alexandru, Tantaroudas Nikolaos Dimitrios, Popovici Dan-Matei

PDF

TL;DR

This paper introduces a methodology for querying structured datasets with natural language by training a compact language model to generate executable queries, enabling resource-efficient and accurate data access.

Contribution

It presents a pipeline for synthetic data generation and fine-tunes a small model to handle structured data queries, outperforming larger models in resource-constrained settings.

Findings

01

High accuracy across monolingual, multilingual, and unseen locations

02

Effective on a dataset about accessibility to essential services in Spain

03

Small models can achieve high precision without large proprietary LLMs

Abstract

This paper presents an open source methodology for allowing users to query structured non textual datasets through natural language Unlike Retrieval Augmented Generation RAG which struggles with numerical and highly structured information our approach trains an LLM to generate executable queries To support this capability we introduce a principled pipeline for synthetic training data generation producing diverse question answer pairs that capture both user intent and the semantics of the underlying dataset We fine tune a compact model DeepSeek R1 Distill 8B using QLoRA with 4 bit quantization making the system suitable for deployment on commodity hardware We evaluate our approach on a dataset describing accessibility to essential services across Durangaldea Spain The fine tuned model achieves high accuracy across monolingual multilingual and unseen location scenarios demonstrating both…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.