Runtime QoS service for application-driven adaptation in network computing
Feras Al-Hawari, Elias Manolakos

TL;DR
This paper presents a runtime QoS service with lightweight middleware for application-driven adaptation in networked environments, enabling performance and fault tolerance with minimal overhead.
Contribution
It introduces a QoS middleware and API that facilitate dynamic adaptation and fault tolerance in distributed applications on a Network of Workstations.
Findings
The QoS middleware has minor performance impact.
The API enables effective fault tolerance and scheduling.
Application adaptation improves resilience and efficiency.
Abstract
A distributed application executing on a Network of Workstations (NOW) needs to be resource state aware to possibly adapt itself accordingly in order to keep satisfying the desired Quality of Service (QoS) demands throughout its lifespan. We implemented a QoS service to enable application-driven adaptation for performance and fault tolerance at runtime. The service is associated with lightweight middleware that monitors the state and load of all application entities (e.g., machines, tasks, and logical network links). Moreover, it makes its services available to an application task via an anonymous and simple to use QoS API. We present a Manager-Worker application that uses our fault tolerance QoS API to adapt for Worker faults in order to avoid application deadlock at runtime. Moreover, we show how a dynamic application-level scheduler can easily utilize the QoS API to find efficient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Cloud Computing and Resource Management · Distributed systems and fault tolerance
