Data Discovery and Anomaly Detection Using Atypicality: Signal Processing Methods
Elyas Sabeti, Anders H{\o}st-Madsen

TL;DR
This paper extends atypicality-based data analysis methods to real-valued data using MDL, enabling the detection of rare, interesting data segments in large datasets, demonstrated on hydrophone recordings.
Contribution
It introduces a universal atypicality detection approach for real-valued data using MDL, expanding previous discrete data methods and applying them to practical signal processing models.
Findings
Method effectively detects rare data segments.
Theoretical properties align with discrete data case.
Applied successfully to hydrophone data.
Abstract
The aim of atypicality is to extract small, rare, unusual and interesting pieces out of big data. This complements statistics about typical data to give insight into data. In order to find such "interesting" parts of data, universal approaches are required, since it is not known in advance what we are looking for. We therefore base the atypicality criterion on codelength. In a prior paper we developed the methodology for discrete-valued data, and the the current paper extends this to real-valued data. This is done by using minimum description length (MDL). We show that this shares a number of theoretical properties with the discrete-valued case. We develop the methodology for a number of "universal" signal processing models, and finally apply them to recorded hydrophone data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBlind Source Separation Techniques · Neural Networks and Applications · Algorithms and Data Compression
