kRAIG: A Natural Language-Driven Agent for Automated DataOps Pipeline Generation

Rohan Siva; Kai Cheung; Lichi Li; and Ganesh Sundaram

arXiv:2603.20311·cs.SE·March 24, 2026

kRAIG: A Natural Language-Driven Agent for Automated DataOps Pipeline Generation

Rohan Siva, Kai Cheung, Lichi Li, and Ganesh Sundaram

PDF

Open Access

TL;DR

kRAIG is an AI agent that converts natural language descriptions into executable Kubeflow Pipelines, improving automation, accuracy, and reliability in data engineering workflows through explicit intent clarification and validation stages.

Contribution

This work introduces kRAIG, a novel LLM-based system that automates data pipeline generation with explicit intent clarification and validation, outperforming existing approaches.

Findings

01

3x improvement in extraction and loading success

02

25% increase in transformation accuracy

03

Enhanced pipeline reliability and safety

Abstract

Modern machine learning systems rely on complex data engineering workflows to extract, transform, and load (ELT) data into production pipelines. However, constructing these pipelines remains time-consuming and requires substantial expertise in data infrastructure and orchestration frameworks. Recent advances in large language model (LLM) agents offer a potential path toward automating these workflows, but existing approaches struggle with under-specified user intent, unreliable tool generation, and limited guarantees of executable outputs. We introduce kRAIG, an AI agent that translates natural language specifications into production-ready Kubeflow Pipelines (KFP). To resolve ambiguity in user intent, we propose ReQuesAct (Reason, Question, Act), an interaction framework that explicitly clarifies intent prior to pipeline synthesis. The system orchestrates end-to-end data movement from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Computing and Data Management · Machine Learning and Data Classification · Explainable Artificial Intelligence (XAI)