On the Reduction of Biases in Big Data Sets for the Detection of   Irregular Power Usage

Patrick Glauner; Radu State; Petko Valtchev; Diogo Duarte

arXiv:1801.05627·cs.LG·April 4, 2018

On the Reduction of Biases in Big Data Sets for the Detection of Irregular Power Usage

Patrick Glauner, Radu State, Petko Valtchev, Diogo Duarte

PDF

TL;DR

This paper introduces a scalable framework to reduce biases like class imbalance and covariate shift in high-dimensional data, improving the reliability of machine learning models for detecting irregular power usage in noisy industrial datasets, with real-world economic benefits.

Contribution

The authors propose a novel, scalable method for bias reduction in high-dimensional data, specifically applied to irregular power usage detection, enhancing model accuracy and reliability.

Findings

01

Bias reduction improves detection accuracy

02

Models are deployed in commercial software

03

Significant economic value achieved

Abstract

In machine learning, a bias occurs whenever training sets are not representative for the test data, which results in unreliable models. The most common biases in data are arguably class imbalance and covariate shift. In this work, we aim to shed light on this topic in order to increase the overall attention to this issue in the field of machine learning. We propose a scalable novel framework for reducing multiple biases in high-dimensional data sets in order to train more reliable predictors. We apply our methodology to the detection of irregular power usage from real, noisy industrial data. In emerging markets, irregular power usage, and electricity theft in particular, may range up to 40% of the total electricity distributed. Biased data sets are of particular issue in this domain. We show that reducing these biases increases the accuracy of the trained predictors. Our models have the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.