Untangling the Braid: Finding Outliers in a Set of Streams
Chiranjeeb Buragohain, Luca Foschini, Subhash Suri

TL;DR
This paper explores the challenge of identifying outliers in streaming data from shared computing systems, establishing theoretical limits and proposing heuristics that perform well in practice.
Contribution
It provides the first space complexity bounds for outlier detection in streams, proves lower bounds for natural measures, and introduces heuristics with promising empirical results.
Findings
Theoretical lower bounds for approximating outliers in streams.
Heuristics perform well on synthetic data.
Good detection possible for simple measures like max or min.
Abstract
Monitoring the performance of large shared computing systems such as the cloud computing infrastructure raises many challenging algorithmic problems. One common problem is to track users with the largest deviation from the norm (outliers), for some measure of performance. Taking a stream-computing perspective, we can think of each user's performance profile as a stream of numbers (such as response times), and the aggregate performance profile of the shared infrastructure as a "braid" of these intermixed streams. The monitoring system's goal then is to untangle this braid sufficiently to track the top k outliers. This paper investigates the space complexity of one-pass algorithms for approximating outliers of this kind, proves lower bounds using multi-party communication complexity, and proposes small-memory heuristic algorithms. On one hand, stream outliers are easily tracked for simple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Data Management and Algorithms · Anomaly Detection Techniques and Applications
