Enhancing Robustness of On-line Learning Models on Highly Noisy Data

Zilong Zhao; Robert Birke; Rui Han; Bogdan Robu; Sara Bouchenak; Sonia; Ben Mokhtar; Lydia Y. Chen

arXiv:2103.10824·cs.LG·March 22, 2021

Enhancing Robustness of On-line Learning Models on Highly Noisy Data

Zilong Zhao, Robert Birke, Rui Han, Bogdan Robu, Sara Bouchenak, Sonia, Ben Mokhtar, Lydia Y. Chen

PDF

1 Repo

TL;DR

This paper introduces a robust online anomaly detection framework, RAD, that effectively handles highly noisy data across various applications by continuously cleansing data and leveraging ensemble predictions.

Contribution

The paper proposes a novel ensemble-based online data selection framework, RAD, with extensions for adaptive learning and oracle knowledge, improving robustness against noisy labels in anomaly detection.

Findings

01

Achieves up to 98.95% accuracy in IoT attack detection under 40% label noise.

02

Improves cloud task failure prediction accuracy by 14% under 40% label noise.

03

Reaches 77.51% accuracy in face recognition under 30% label noise.

Abstract

Classification algorithms have been widely adopted to detect anomalies for various systems, e.g., IoT, cloud and face recognition, under the common assumption that the data source is clean, i.e., features and labels are correctly set. However, data collected from the wild can be unreliable due to careless annotations or malicious data transformation for incorrect anomaly detection. In this paper, we extend a two-layer on-line data selection framework: Robust Anomaly Detector (RAD) with a newly designed ensemble prediction where both layers contribute to the final anomaly detection decision. To adapt to the on-line nature of anomaly detection, we consider additional features of conflicting opinions of classifiers, repetitive cleaning, and oracle knowledge. We on-line learn from incoming data streams and continuously cleanse the data, so as to adapt to the increasing learning capacity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhao-zilong/MotivationCaseStudies
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.