Scalable Distributed Job Processing with Dynamic Load Balancing
Putti Srinivasrao, V. P. C. Rao, A. Govardhan, Ambika Prasad Mohanty

TL;DR
This paper introduces a cost-effective, scalable distributed job processing framework that dynamically balances load, ensures fault tolerance, and adapts to heterogeneous systems without job migration, using a message bus and centralized monitoring.
Contribution
It presents a novel distributed job processing system with integrated load balancing, fault tolerance, and scalability features, avoiding job migration and supporting heterogeneous environments.
Findings
System effectively balances load in heterogeneous environments
Supports fault tolerance and failover recovery
Scales horizontally and vertically to optimize performance
Abstract
We present here a cost effective framework for a robust scalable and distributed job processing system that adapts to the dynamic computing needs easily with efficient load balancing for heterogeneous systems. The design is such that each of the components are self contained and do not depend on each other. Yet, they are still interconnected through an enterprise message bus so as to ensure safe, secure and reliable communication based on transactional features to avoid duplication as well as data loss. The load balancing, fault-tolerance and failover recovery are built into the system through a mechanism of health check facility and a queue based load balancing. The system has a centralized repository with central monitors to keep track of the progress of various job executions as well as status of processors in real-time. The basic requirement of assigning a priority and processing as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Distributed systems and fault tolerance · Real-Time Systems Scheduling
