HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases
Salman Niazi (1), Mahmoud Ismail (1), Steffen Grohsschmiedt (2),, Mikael Ronstr\"om (3), Seif Haridi (1), Jim Dowling (1) ((1) KTH - Royal, Institute of Technology, (2) Spotify AB, (3) Oracle)

TL;DR
HopsFS leverages NewSQL databases to replace traditional in-memory metadata management in HDFS, significantly increasing scalability, throughput, and reliability for large distributed file systems.
Contribution
This paper introduces HopsFS, a scalable distributed file system that replaces HDFS metadata with a NewSQL database, enabling larger capacity and higher throughput.
Findings
Metadata capacity increased by at least 37 times.
Supports 16 to 37 times the throughput of HDFS.
Lower latency and no downtime during failover.
Abstract
Recent improvements in both the performance and scalability of shared-nothing, transactional, in-memory NewSQL databases have reopened the research question of whether distributed metadata for hierarchical file systems can be managed using commodity databases. In this paper, we introduce HopsFS, a next generation distribution of the Hadoop Distributed File System (HDFS) that replaces HDFS' single node in-memory metadata service, with a distributed metadata service built on a NewSQL database. By removing the metadata bottleneck, HopsFS enables an order of magnitude larger and higher throughput clusters compared to HDFS. Metadata capacity has been increased to at least 37 times HDFS' capacity, and in experiments based on a workload trace from Spotify, we show that HopsFS supports 16 to 37 times the throughput of Apache HDFS. HopsFS also has lower latency for many concurrent clients, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
