E2E-REME: Towards End-to-End Microservices Auto-Remediation via Experience-Simulation Reinforcement Fine-Tuning

Lingzhe Zhang; Yunpeng Zhai; Tong Jia; Minghua He; Chiming Duan; Zhaoyang Liu; Bolin Ding; Ying Li

arXiv:2604.11094·cs.SE·April 14, 2026

E2E-REME: Towards End-to-End Microservices Auto-Remediation via Experience-Simulation Reinforcement Fine-Tuning

Lingzhe Zhang, Yunpeng Zhai, Tong Jia, Minghua He, Chiming Duan, Zhaoyang Liu, Bolin Ding, Ying Li

PDF

TL;DR

This paper presents E2E-REME, an end-to-end reinforcement fine-tuned model for microservice auto-remediation, outperforming existing LLM-based approaches in accuracy and efficiency.

Contribution

It introduces a new task, E2E-MR, and a benchmark, MicroRemed, along with a novel auto-remediation model trained via experience-simulation reinforcement fine-tuning.

Findings

01

E2E-REME achieves higher accuracy than nine baseline LLMs.

02

E2E-REME demonstrates improved efficiency in microservice failure recovery.

03

The benchmark MicroRemed enables comprehensive evaluation of auto-remediation methods.

Abstract

Contemporary microservice systems continue to grow in scale and complexity, leading to increasingly frequent and costly failures. While recent LLM-based auto-remediation approaches have emerged, they primarily translate textual instructions into executable Ansible playbooks and rely on expert-crafted prompts, lacking runtime knowledge guidance and depending on large-scale general-purpose LLMs, which limits their accuracy and efficiency. We introduce \textit{End-to-End Microservice Remediation} (E2E-MR), a new task that requires directly generating executable playbooks from diagnosis reports to autonomously restore faulty systems. To enable rigorous evaluation, we build \textit{MicroRemed}, a benchmark that automates microservice deployment, failure injection, playbook execution, and post-repair verification. We further propose \textit{E2E-REME}, an end-to-end auto-remediation model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.