Building Block Components to Control a Data Rate in the Apache Hadoop Compute Platform
Tien Van Do (1), Binh T. Vu (1), Nam H. Do (1), L\'or\'ant, Farkas (2), Csaba Rotter (2), Tam\'as Tarj\'anyi (2) ((1) Budapest, University of Technology, Economics, (2) Nokia)

TL;DR
This paper introduces building block components that enable data rate control in Hadoop's compute platform by managing data pipes between containers and DataNodes, enhancing resource management beyond CPU and RAM.
Contribution
It presents a novel solution for controlling data rates in Hadoop by managing data pipes, extending resource management capabilities in YARN.
Findings
Implemented data rate control components successfully
Demonstrated effective data pipe management with measurements
Enhanced resource control in Hadoop environment
Abstract
Resource management is one of the most indispensable components of cluster-level infrastructure layers. Users of such systems should be able to specify their job requirements as a configuration parameter (CPU, RAM, disk I/O, network I/O) and have the scheduler translate those into an appropriate reservation and allocation of resources. YARN is an emerging resource management in the Hadoop ecosystem, which supports only RAM and CPU reservation at present. In this paper, we propose a solution that takes into account the operation of the Hadoop Distributed File System to control the data rate of applications in the framework of a Hadoop compute platform. We utilize the property that a data pipe between a container and a DataNode consists of a disk I/O subpipe and a TCP/IP subpipe. We have implemented building block software components to control the data rate of data pipes between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · IoT and Edge/Fog Computing · Caching and Content Delivery
