Probe Scheduling for Efficient Detection of Silent Failures

Edith Cohen; Avinatan Hassidim; Haim Kaplan; Yishay Mansour; Danny; Raz; Yoav Tzur

arXiv:1302.0792·cs.NI·June 20, 2014

Probe Scheduling for Efficient Detection of Silent Failures

Edith Cohen, Avinatan Hassidim, Haim Kaplan, Yishay Mansour, Danny, Raz, Yoav Tzur

PDF

TL;DR

This paper develops and analyzes efficient probe scheduling strategies for detecting silent failures in networks, balancing detection speed and resource use through novel stochastic and deterministic methods.

Contribution

It introduces a unified model for probe scheduling, provides efficient algorithms for memoryless schedules, and develops new deterministic schedulers with provable approximation guarantees.

Findings

01

Memoryless schedules can be efficiently optimized via convex or linear programming.

02

Deterministic schedules with bounded worst-case detection times are achievable with approximation guarantees.

03

Simulation results show significant improvements over previous methods.

Abstract

Most discovery systems for silent failures work in two phases: a continuous monitoring phase that detects presence of failures through probe packets and a localization phase that pinpoints the faulty element(s). This separation is important because localization requires significantly more resources than detection and should be initiated only when a fault is present. We focus on improving the efficiency of the detection phase, where the goal is to balance the overhead with the cost associated with longer failure detection times. We formulate a general model which unifies the treatment of probe scheduling mechanisms, stochastic or deterministic, and different cost objectives - minimizing average detection time (SUM) or worst-case detection time (MAX). We then focus on two classes of schedules. {\em Memoryless schedules} -- a subclass of stochastic schedules which is simple and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.