Timeout Control in Distributed Systems Using Perturbation Analysis: Multiple Communication Links
Ali Kebarighotbi, Christos G. Cassandras

TL;DR
This paper models and analyzes timeout control in distributed systems with multiple nodes sharing a communication link, using perturbation analysis to optimize timeout thresholds for improved system performance.
Contribution
It extends previous models to multiple nodes sharing bandwidth and derives derivative estimates for distributed timeout threshold optimization.
Findings
Derived stochastic hybrid model for multiple nodes
Applied Infinitesimal Perturbation Analysis for derivative estimation
Facilitated local optimization of timeout thresholds
Abstract
Timeout control is a simple mechanism used when direct feedback is either impossible, unreliable, or too costly, as is often the case in distributed systems. Its effectiveness is determined by a timeout threshold parameter and our goal is to quantify the effect of this parameter on the system behavior. In this paper, we extend previous results to the case where there are N transmitting nodes making use of a common communication link bandwidth. After deriving the stochastic hybrid model for this problem, we apply Infinitesimal Perturbation Analysis to find the derivative estimates of aggregate average goodput of the system. We also derive the derivative estimate of the goodput of a transmitter with respect to its own timeout threshold which can be used for local and hence, distributed optimization.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
