A Natural Language Interface for Efficient Data Retrieval in SDSS
Prathamesh Tamhane

TL;DR
This paper presents a natural language interface for SDSS that uses a fine-tuned transformer model to translate user queries into SQL, making data retrieval accessible to non-experts.
Contribution
It introduces a novel approach of fine-tuning a compact transformer model on domain-specific NL-SQL pairs for astronomical data retrieval.
Findings
Model produces syntactically valid SQL queries
High semantic accuracy in translating user requests
Effective for diverse astronomy data queries
Abstract
Modern astronomical surveys such as the Sloan Digital Sky Survey (SDSS) provide extensive astronomical databases enabling researchers to access vast amount of diverse data. However, retrieving data from archives requires knowledge of query languages and familiarity with their schema, which presents a barrier for non-experts. This work investigates the use of Microsoft Phi-2, a compact yet powerful transformer-based language model, fine-tuned on natural language--SQL pairs constructed from SDSS query examples. We develop an interface that translates user queries in natural language into SQL commands compatible with SDSS SkyServer. Preliminary evaluation shows that the fine-tuned model produces syntactically valid and largely semantically correct queries across a variety of astronomy-related requests. Our results show that even small-scale models, when carefully fine-tuned, can provide…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
