MetaKube: An Experience-Aware LLM Framework for Kubernetes Failure Diagnosis
Wei Sun, Ting Wang, Xinran Tian, Wanshun Lan, Xuhan Feng, Haoyue Li, and Fangxin Wang

TL;DR
MetaKube is an experience-aware LLM framework for Kubernetes failure diagnosis that learns from operational history, improving diagnostic accuracy and efficiency while maintaining data privacy.
Contribution
It introduces EPMN, a meta-cognitive controller, and domain-specific fine-tuning of KubeLLM, enabling experiential learning and improved diagnosis performance.
Findings
MetaKube improves diagnosis accuracy from 50.9 to 90.5 points.
EPMN contributes a 15.3% improvement through experiential learning.
MetaKube approaches GPT-4.1 performance while ensuring data privacy.
Abstract
Existing LLM-based Kubernetes diagnostic systems cannot learn from operational experience, operating on static knowledge bases without improving from past resolutions. We present MetaKube, an experience-aware LLM framework through three synergistic innovations: (1) an Episodic Pattern Memory Network (EPMN) that abstracts diagnostic patterns from historical resolutions and provides confidence-calibrated retrieval for both rapid pattern matching and guided causal exploration, (2) a meta-cognitive controller that dynamically routes between intuitive and analytical pathways based on problem familiarity, optimizing the trade-off between speed and depth, and (3) KubeLLM, a locally-deployable 8B model enhanced through domain-specific post-training on our 7,000-sample Kubernetes Fault Resolution Dataset. Evaluation on 1,873 real-world scenarios demonstrates MetaKube transforms Qwen3-8B from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Anomaly Detection Techniques and Applications · Machine Fault Diagnosis Techniques
