RAD: On-line Anomaly Detection for Highly Unreliable Data
Zilong Zhao, Robert Birke, Rui Han, Bogdan Robu, Sara Bouchenak, Sonia, Ben Mokhtar, Lydia Y. Chen

TL;DR
This paper introduces RAD, an online learning framework that robustly detects anomalies in unreliable data sources by filtering suspicious data and learning from multiple data streams, improving accuracy in IoT, cloud, and face recognition applications.
Contribution
RAD is a novel two-layer online framework that effectively handles unreliable labels and adapts through continuous data cleansing and oracle knowledge integration.
Findings
Achieves up to 98% accuracy in IoT attack detection.
Improves cloud task failure prediction accuracy by 20%.
Enhances face recognition accuracy by 28% under noisy labels.
Abstract
Classification algorithms have been widely adopted to detect anomalies for various systems, e.g., IoT, cloud and face recognition, under the common assumption that the data source is clean, i.e., features and labels are correctly set. However, data collected from the wild can be unreliable due to careless annotations or malicious data transformation for incorrect anomaly detection. In this paper, we present a two-layer on-line learning framework for robust anomaly detection (RAD) in the presence of unreliable anomaly labels, where the first layer is to filter out the suspicious data, and the second layer detects the anomaly patterns from the remaining data. To adapt to the on-line nature of anomaly detection, we extend RAD with additional features of repetitively cleaning, conflicting opinions of classifiers, and oracle knowledge. We on-line learn from the incoming data streams and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Machine Learning and Data Classification · Imbalanced Data Classification Techniques
