M$^2$-MFP: A Multi-Scale and Multi-Level Memory Failure Prediction Framework for Reliable Cloud Infrastructure
Hongyi Xie, Min Zhou, Qiao Yu, Jialiang Yu, Zhenli Sheng, Hong Xie, Defu Lian

TL;DR
M$^2$-MFP is a novel hierarchical framework that enhances memory failure prediction in cloud systems by combining multi-level feature extraction and interpretable temporal modeling, significantly outperforming existing methods.
Contribution
The paper introduces M$^2$-MFP, a multi-scale, hierarchical prediction framework that automatically extracts high-order features and employs dual-path temporal modeling for improved reliability.
Findings
Outperforms state-of-the-art methods on benchmark datasets.
Effective in real-world cloud infrastructure deployment.
Significantly higher recall and accuracy in failure prediction.
Abstract
As cloud services become increasingly integral to modern IT infrastructure, ensuring hardware reliability is essential to sustain high-quality service. Memory failures pose a significant threat to overall system stability, making accurate failure prediction through the analysis of memory error logs (i.e., Correctable Errors) imperative. Existing memory failure prediction approaches have notable limitations: rule-based expert models suffer from limited generalizability and low recall rates, while automated feature extraction methods exhibit suboptimal performance. To address these limitations, we propose M-MFP: a Multi-scale and hierarchical memory failure prediction framework designed to enhance the reliability and availability of cloud infrastructure. M-MFP converts Correctable Errors (CEs) into multi-level binary matrix representations and introduces a Binary Spatial Feature…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
