Design and Implementation of an Automated Disaster-recovery System for a Kubernetes Cluster Using LSTM
Ji-Beom Kim, Je-Bum Choi, and Eun-Sung Jung

TL;DR
This paper presents an automated disaster recovery system for Kubernetes clusters that uses LSTM to predict CPU utilization, enabling rapid, automatic recovery within 15 seconds and improving data protection in cloud environments.
Contribution
It introduces a novel integrated system combining Kubernetes management with LSTM-based prediction for automatic disaster recovery, reducing recovery time and preventing performance degradation.
Findings
Recovery process completed within 15 seconds without human intervention
LSTM prediction effectively prevents performance degradation
System enhances data management and recovery efficiency in cloud environments
Abstract
With the increasing importance of data in the modern business environment, effective data man-agement and protection strategies are gaining increasing research attention. Data protection in a cloud environment is crucial for safeguarding information assets and maintaining sustainable services. This study introduces a system structure that integrates Kubernetes management plat-forms with backup and restoration tools. This system is designed to immediately detect disasters and automatically recover applications from another kubernetes cluster. The experimental results show that this system executes the restoration process within 15 s without human intervention, enabling rapid recovery. This, in turn, significantly reduces the potential for delays and errors compared with manual recovery processes, thereby enhancing data management and recovery ef-ficiency in cloud environments. Moreover,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems
