Database Normalization via Dual-LLM Self-Refinement
Eunjae Jo, Nakyung Lee, Gyuyeong Kim

TL;DR
Miffie is an automated database normalization framework using dual large language models for schema generation and verification, achieving high accuracy without human effort.
Contribution
It introduces a dual-model self-refinement architecture that automates normalization, a novel approach leveraging LLMs for schema generation and validation.
Findings
Successfully normalizes complex schemas
Maintains high accuracy in normalization
Reduces manual effort in database schema design
Abstract
Database normalization is crucial to preserving data integrity. However, it is time-consuming and error-prone, as it is typically performed manually by data engineers. To this end, we present Miffie, a database normalization framework that leverages the capability of large language models. Miffie enables automated data normalization without human effort while preserving high accuracy. The core of Miffie is a dual-model self-refinement architecture that combines the best-performing models for normalized schema generation and verification, respectively. The generation module eliminates anomalies based on the feedback of the verification module until the output schema satisfies the requirement for normalization. We also carefully design task-specific zero-shot prompts to guide the models for achieving both high accuracy and cost efficiency. Experimental results show that Miffie can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
