Quantifying and Reducing Bias in Maximum Likelihood Estimation of Structured Anomalies
Uthsav Chitra, Kimberly Ding, Jasper C.H. Lee, Benjamin J. Raphael

TL;DR
This paper investigates the bias in maximum likelihood estimators for structured anomalies, demonstrating how the size of the anomaly family affects bias and proposing a new, unbiased estimator using a mixture model.
Contribution
It provides a theoretical analysis of MLE bias depending on anomaly family size and introduces a novel unbiased estimator applicable to various anomaly structures.
Findings
MLE bias depends on the size of the anomaly family
MLE is asymptotically unbiased if the number of sets is sub-exponential
Proposed mixture model estimator is asymptotically unbiased regardless of family size
Abstract
Anomaly estimation, or the problem of finding a subset of a dataset that differs from the rest of the dataset, is a classic problem in machine learning and data mining. In both theoretical work and in applications, the anomaly is assumed to have a specific structure defined by membership in an . For example, in temporal data the anomaly family may be time intervals, while in network data the anomaly family may be connected subgraphs. The most prominent approach for anomaly estimation is to compute the Maximum Likelihood Estimator (MLE) of the anomaly; however, it was recently observed that for normally distributed data, the MLE is a estimator for some anomaly families. In this work, we demonstrate that in the normal means setting, the bias of the MLE depends on the size of the anomaly family. We prove that if the number of sets in the anomaly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Data-Driven Disease Surveillance · Statistical Methods and Inference
