kRAIG: A Natural Language-Driven Agent for Automated DataOps Pipeline Generation
Rohan Siva, Kai Cheung, Lichi Li, and Ganesh Sundaram

TL;DR
kRAIG is an AI agent that converts natural language descriptions into executable Kubeflow Pipelines, improving automation, accuracy, and reliability in data engineering workflows through explicit intent clarification and validation stages.
Contribution
This work introduces kRAIG, a novel LLM-based system that automates data pipeline generation with explicit intent clarification and validation, outperforming existing approaches.
Findings
3x improvement in extraction and loading success
25% increase in transformation accuracy
Enhanced pipeline reliability and safety
Abstract
Modern machine learning systems rely on complex data engineering workflows to extract, transform, and load (ELT) data into production pipelines. However, constructing these pipelines remains time-consuming and requires substantial expertise in data infrastructure and orchestration frameworks. Recent advances in large language model (LLM) agents offer a potential path toward automating these workflows, but existing approaches struggle with under-specified user intent, unreliable tool generation, and limited guarantees of executable outputs. We introduce kRAIG, an AI agent that translates natural language specifications into production-ready Kubeflow Pipelines (KFP). To resolve ambiguity in user intent, we propose ReQuesAct (Reason, Question, Act), an interaction framework that explicitly clarifies intent prior to pipeline synthesis. The system orchestrates end-to-end data movement from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Machine Learning and Data Classification · Explainable Artificial Intelligence (XAI)
