RAG-Driven Data Quality Governance for Enterprise ERP Systems
Sedat Bin Vedat, Enes Kutay Yarkan, Meftun Akarsu, Recep Kaan Karaman, Arda Sar, \c{C}a\u{g}r{\i} \c{C}elikbilek, Sava\c{s} Sayg{\i}l{\i}

TL;DR
This paper introduces an AI-driven data quality governance system for enterprise ERP, combining automated cleaning and GPT-4-powered SQL query generation to improve accuracy, speed, and cost-efficiency in managing large employee datasets.
Contribution
It presents a novel end-to-end pipeline integrating multi-stage data cleaning with retrieval-augmented generation for natural language querying in multiple languages.
Findings
92.5% query validity achieved
Query turnaround time reduced from 2.3 days to under 5 seconds
System maintains 99.2% uptime and reduces costs by 68% compared to GPT-3.5
Abstract
Enterprise ERP systems managing hundreds of thousands of employee records face critical data quality challenges when human resources departments perform decentralized manual entry across multiple languages. We present an end-to-end pipeline combining automated data cleaning with LLM-driven SQL query generation, deployed on a production system managing 240,000 employee records over six months. The system operates in two integrated stages: a multi-stage cleaning pipeline that performs translation normalization, spelling correction, and entity deduplication during periodic synchronization from Microsoft SQL Server to PostgreSQL; and a retrieval-augmented generation framework powered by GPT-4o that translates natural-language questions in Turkish, Russian, and English into validated SQL queries. The query engine employs LangChain orchestration, FAISS vector similarity search, and few-shot…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Research Data Management Practices · Cloud Data Security Solutions
