Automated, Unsupervised, and Auto-parameterized Inference of Data Patterns and Anomaly Detection
Qiaolin Qin, Heng Li, Ettore Merlo, Maxime Lamothe

TL;DR
This paper presents RIOLU, an automated, unsupervised method for inferring data patterns and detecting anomalies that outperforms existing approaches in accuracy and efficiency without requiring labeled data or manual configuration.
Contribution
RIOLU introduces a fully automated, parameter-free approach for pattern inference and anomaly detection in unlabeled data, surpassing state-of-the-art performance.
Findings
Achieves a 97.2% F1 score in pattern inference.
Up to 800.4% improvement in anomaly detection F1 score.
Outperforms ChatGPT in accuracy and inference time.
Abstract
With the advent of data-centric and machine learning (ML) systems, data quality is playing an increasingly critical role in ensuring the overall quality of software systems. Data preparation, an essential step towards high data quality, is known to be a highly effort-intensive process. Although prior studies have dealt with one of the most impacting issues, data pattern violations, these studies usually require data-specific configurations (i.e., parameterized) or use carefully curated data as learning examples (i.e., supervised), relying on domain knowledge and deep understanding of the data, or demanding significant manual effort. In this paper, we introduce RIOLU: Regex Inferencer auto-parameterized Learning with Uncleaned data. RIOLU is fully automated, automatically parameterized, and does not need labeled samples. RIOLU can generate precise patterns from datasets in various…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Time Series Analysis and Forecasting · Network Security and Intrusion Detection
