A Scalable Stream-Oriented Framework for Cluster Applications
Tassos S. Argyros, David R. Cheriton

TL;DR
This paper introduces a scalable, stream-oriented framework for cluster applications that enhances performance and reliability by leveraging data flow streams, load balancing, and fault tolerance, suitable for large-scale clusters.
Contribution
It proposes a novel architecture that improves scalability and fault tolerance in cluster applications by exploiting stream processing principles.
Findings
Supports scaling to tens of thousands of nodes
Reduces performance loss and reliability issues
Demonstrated with data mining applications on a cluster simulator
Abstract
This paper presents a stream-oriented architecture for structuring cluster applications. Clusters that run applications based on this architecture can scale to tenths of thousands of nodes with significantly less performance loss or reliability problems. Our architecture exploits the stream nature of the data flow and reduces congestion through load balancing, hides latency behind data pushes and transparently handles node failures. In our ongoing work, we are developing an implementation for this architecture and we are able to run simple data mining applications on a cluster simulator.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Distributed and Parallel Computing Systems · Cloud Computing and Resource Management
