Adaptive Data Flywheel: Applying MAPE Control Loops to AI Agent Improvement
Aaditya Shukla, Sidney Knowles, Meenakshi Madugula, Dave Farris, Ryan Angilly, Santiago Pombo, Anbang Xu, Lu An, Abhinav Balasubramanian, Tan Yu, Jiaxiang Ren, Rama Akkiraju

TL;DR
This paper presents a practical implementation of an adaptive data flywheel using MAPE control loops to improve enterprise AI agents, demonstrating continuous learning, failure mitigation, and significant latency and accuracy gains in a real-world setting.
Contribution
It introduces a structured MAPE-driven data flywheel framework for enterprise AI, enabling continuous, automated improvements based on real-world feedback and failure analysis.
Findings
Achieved 96% accuracy in routing with a fine-tuned 8B model
Reduced latency by up to 70% through targeted fine-tuning
Identified and addressed key failure modes in RAG pipelines
Abstract
Enterprise AI agents must continuously adapt to maintain accuracy, reduce latency, and remain aligned with user needs. We present a practical implementation of a data flywheel in NVInfo AI, NVIDIA's Mixture-of-Experts (MoE) Knowledge Assistant serving over 30,000 employees. By operationalizing a MAPE-driven data flywheel, we built a closed-loop system that systematically addresses failures in retrieval-augmented generation (RAG) pipelines and enables continuous learning. Over a 3-month post-deployment period, we monitored feedback and collected 495 negative samples. Analysis revealed two major failure modes: routing errors (5.25\%) and query rephrasal errors (3.2\%). Using NVIDIA NeMo microservices, we implemented targeted improvements through fine-tuning. For routing, we replaced a Llama 3.1 70B model with a fine-tuned 8B variant, achieving 96\% accuracy, a 10x reduction in model size,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSoftware System Performance and Reliability · Big Data and Digital Economy · Cloud Computing and Resource Management
