Outlier Detection on Mixed-Type Data: An Energy-based Approach
Kien Do, Truyen Tran, Dinh Phung, Svetha Venkatesh

TL;DR
This paper introduces an energy-based unsupervised outlier detection method for mixed-type data using a novel probabilistic model, demonstrating its effectiveness and scalability on synthetic and real datasets.
Contribution
The paper proposes a new outlier detection approach for mixed-type data based on Mv.RBM and free-energy, addressing a key challenge in heterogeneous data analysis.
Findings
The method is fast, scalable, and competitive with state-of-the-art techniques.
Proper handling of mixed data types improves outlier detection accuracy.
Free-energy derived from Mv.RBM effectively identifies low-density outliers.
Abstract
Outlier detection amounts to finding data points that differ significantly from the norm. Classic outlier detection methods are largely designed for single data type such as continuous or discrete. However, real world data is increasingly heterogeneous, where a data point can have both discrete and continuous attributes. Handling mixed-type data in a disciplined way remains a great challenge. In this paper, we propose a new unsupervised outlier detection method for mixed-type data based on Mixed-variate Restricted Boltzmann Machine (Mv.RBM). The Mv.RBM is a principled probabilistic method that models data density. We propose to use \emph{free-energy} derived from Mv.RBM as outlier score to detect outliers as those data points lying in low density regions. The method is fast to learn and compute, is scalable to massive datasets. At the same time, the outlier score is identical to data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Generative Adversarial Networks and Image Synthesis · Currency Recognition and Detection
