# Automating Distributed Tiered Storage Management in Cluster Computing

**Authors:** Herodotos Herodotou, Elena Kakoulli

arXiv: 1907.02394 · 2020-06-22

## TL;DR

This paper presents an automated framework using machine learning to manage data movement across storage tiers in distributed systems, improving performance and efficiency in data-intensive cluster computing.

## Contribution

It introduces a novel, adaptive system that automatically manages storage tiers using incremental learning to optimize data placement based on access patterns.

## Key findings

- Significant performance improvements over existing policies
- Enhanced cluster efficiency through automated data tiering
- Adaptive models effectively track workload changes

## Abstract

Data-intensive platforms such as Hadoop and Spark are routinely used to process massive amounts of data residing on distributed file systems like HDFS. Increasing memory sizes and new hardware technologies (e.g., NVRAM, SSDs) have recently led to the introduction of storage tiering in such settings. However, users are now burdened with the additional complexity of managing the multiple storage tiers and the data residing on them while trying to optimize their workloads. In this paper, we develop a general framework for automatically moving data across the available storage tiers in distributed file systems. Moreover, we employ machine learning for tracking and predicting file access patterns, which we use to decide when and which data to move up or down the storage tiers for increasing system performance. Our approach uses incremental learning to dynamically refine the models with new file accesses, allowing them to naturally adjust and adapt to workload changes over time. Our extensive evaluation using realistic workloads derived from Facebook and CMU traces compares our approach with several other policies and showcases significant benefits in terms of both workload performance and cluster efficiency.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.02394/full.md

## Figures

31 figures with captions in the complete paper: https://tomesphere.com/paper/1907.02394/full.md

## References

55 references — full list in the complete paper: https://tomesphere.com/paper/1907.02394/full.md

---
Source: https://tomesphere.com/paper/1907.02394