Quantifying and Reducing Bias in Maximum Likelihood Estimation of   Structured Anomalies

Uthsav Chitra; Kimberly Ding; Jasper C.H. Lee; Benjamin J. Raphael

arXiv:2007.07878·cs.LG·June 14, 2021·1 cites

Quantifying and Reducing Bias in Maximum Likelihood Estimation of Structured Anomalies

Uthsav Chitra, Kimberly Ding, Jasper C.H. Lee, Benjamin J. Raphael

PDF

Open Access 1 Video

TL;DR

This paper investigates the bias in maximum likelihood estimators for structured anomalies, demonstrating how the size of the anomaly family affects bias and proposing a new, unbiased estimator using a mixture model.

Contribution

It provides a theoretical analysis of MLE bias depending on anomaly family size and introduces a novel unbiased estimator applicable to various anomaly structures.

Findings

01

MLE bias depends on the size of the anomaly family

02

MLE is asymptotically unbiased if the number of sets is sub-exponential

03

Proposed mixture model estimator is asymptotically unbiased regardless of family size

Abstract

Anomaly estimation, or the problem of finding a subset of a dataset that differs from the rest of the dataset, is a classic problem in machine learning and data mining. In both theoretical work and in applications, the anomaly is assumed to have a specific structure defined by membership in an $anomaly family$ . For example, in temporal data the anomaly family may be time intervals, while in network data the anomaly family may be connected subgraphs. The most prominent approach for anomaly estimation is to compute the Maximum Likelihood Estimator (MLE) of the anomaly; however, it was recently observed that for normally distributed data, the MLE is a $biased$ estimator for some anomaly families. In this work, we demonstrate that in the normal means setting, the bias of the MLE depends on the size of the anomaly family. We prove that if the number of sets in the anomaly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Quantifying and Reducing Bias in Maximum Likelihood Estimation of Structured Anomalies· slideslive

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Data-Driven Disease Surveillance · Statistical Methods and Inference