Database Normalization via Dual-LLM Self-Refinement

Eunjae Jo; Nakyung Lee; Gyuyeong Kim

arXiv:2508.17693·cs.DB·September 1, 2025

Database Normalization via Dual-LLM Self-Refinement

Eunjae Jo, Nakyung Lee, Gyuyeong Kim

PDF

TL;DR

Miffie is an automated database normalization framework using dual large language models for schema generation and verification, achieving high accuracy without human effort.

Contribution

It introduces a dual-model self-refinement architecture that automates normalization, a novel approach leveraging LLMs for schema generation and validation.

Findings

01

Successfully normalizes complex schemas

02

Maintains high accuracy in normalization

03

Reduces manual effort in database schema design

Abstract

Database normalization is crucial to preserving data integrity. However, it is time-consuming and error-prone, as it is typically performed manually by data engineers. To this end, we present Miffie, a database normalization framework that leverages the capability of large language models. Miffie enables automated data normalization without human effort while preserving high accuracy. The core of Miffie is a dual-model self-refinement architecture that combines the best-performing models for normalized schema generation and verification, respectively. The generation module eliminates anomalies based on the feedback of the verification module until the output schema satisfies the requirement for normalization. We also carefully design task-specific zero-shot prompts to guide the models for achieving both high accuracy and cost efficiency. Experimental results show that Miffie can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.