Making MoE-based LLM Inference Resilient with Tarragon
Songyu Zhang, Aaron Tam, Myungjin Lee, Shixiong Qi, K. K. Ramakrishnan

TL;DR
Tarragon is a resilient MoE inference framework that isolates failures to individual workers, enabling continuous LLM inference with minimal disruption, by reconfiguring data paths and implementing self-healing mechanisms.
Contribution
Tarragon introduces a novel failure confinement and recovery approach for MoE-based LLM inference, significantly reducing failure-induced stalls and maintaining high performance.
Findings
Reduces failure-induced stalls by 160-213x
Maintains performance during failure-free operation
Enables continuous inference despite worker failures
Abstract
Mixture-of-Experts (MoE) models are increasingly used to serve LLMs at scale, but failures become common as deployment scale grows. Existing systems exhibit poor failure resilience: even a single worker failure triggers a coarse-grained, service-wide restart, discarding accumulated progress and halting the entire inference pipeline during recovery--an approach clearly ill-suited for latency-sensitive, LLM services. We present Tarragon, a resilient MoE inference framework that confines the failures impact to individual workers while allowing the rest of the pipeline to continue making forward progress. Tarragon exploits the natural separation between the attention and expert computation in MoE-based transformers, treating attention workers (AWs) and expert workers (EWs) as distinct failure domains. Tarragon introduces a reconfigurable datapath to mask failures by rerouting requests to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Distributed systems and fault tolerance · Parallel Computing and Optimization Techniques
