TVOR: Finding Discrete Total Variation Outliers among Histograms
Nikola Bani\'c, Neven Elezovi\'c

TL;DR
TVOR is a novel method for detecting outliers in histograms based on their discrete total variation, offering a distribution-agnostic, hyperparameter-free approach that outperforms traditional chi-squared tests in identifying smoothness anomalies.
Contribution
The paper introduces TVOR, a new outlier detection technique for histograms using a linear model of discrete total variation, applicable without prior distribution assumptions.
Findings
TVOR effectively detects smoothness outliers in real census data.
It outperforms Pearson's chi-squared test in identifying DTV outliers.
The method is versatile, handling arbitrary bin intervals and unbounded histograms.
Abstract
Pearson's chi-squared test can detect outliers in the data distribution of a given set of histograms. However, in fields such as demographics (for e.g. birth years), outliers may be more easily found in terms of the histogram smoothness where techniques such as Whipple's or Myers' indices handle successfully only specific anomalies. This paper proposes smoothness outliers detection among histograms by using the relation between their discrete total variations (DTV) and their respective sample sizes. This relation is mathematically derived to be applicable in all cases and simplified by an accurate linear model. The deviation of the histogram's DTV from the value predicted by the model is used as the outlier score and the proposed method is named Total Variation Outlier Recognizer (TVOR). TVOR requires no prior assumptions about the histograms' samples' distribution, it has no…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Advanced Statistical Methods and Models · Statistical Methods and Inference
