Automatic Data Transformation Using Large Language Model: An Experimental Study on Building Energy Data
Ankita Sharma, Xuanmao Li, Hong Guan, Guoxin Sun, Liang Zhang, Lanjun, Wang, Kesheng Wu, Lei Cao, Erkang Zhu, Alexander Sim, Teresa Wu, Jia Zou

TL;DR
This paper introduces SQLMorpher, an LLM-based framework for automatic data transformation in building energy datasets, achieving high accuracy and reducing domain knowledge barriers.
Contribution
It pioneers an end-to-end LLM-based approach for data transformation, including a benchmark dataset and extensive empirical evaluation.
Findings
Achieved 96% accuracy on 105 real-world problems
Demonstrated effectiveness of LLMs in complex domain-specific data transformation
Developed an iterative prompt optimization mechanism
Abstract
Existing approaches to automatic data transformation are insufficient to meet the requirements in many real-world scenarios, such as the building sector. First, there is no convenient interface for domain experts to provide domain knowledge easily. Second, they require significant training data collection overheads. Third, the accuracy suffers from complicated schema changes. To bridge this gap, we present a novel approach that leverages the unique capabilities of large language models (LLMs) in coding, complex reasoning, and zero-shot learning to generate SQL code that transforms the source datasets into the target datasets. We demonstrate the viability of this approach by designing an LLM-based framework, termed SQLMorpher, which comprises a prompt generator that integrates the initial prompt with optional domain knowledge and historical patterns in external databases. It also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Software Engineering Research
