TL;DR
This paper introduces SDCOR, a scalable density-based clustering method for local outlier detection in massive datasets, which processes data in chunks and efficiently identifies outliers with low memory usage.
Contribution
The paper proposes a novel batch-wise clustering approach that scales to large datasets and accurately detects outliers without requiring all data in memory.
Findings
Low linear time complexity demonstrated on real and synthetic data
More effective than traditional density-based methods
Outperforms some fast distance-based methods in efficiency
Abstract
This paper presents a batch-wise density-based clustering approach for local outlier detection in massive-scale datasets. Unlike the well-known traditional algorithms, which assume that all the data is memory-resident, our proposed method is scalable and processes the input data chunk-by-chunk within the confines of a limited memory buffer. A temporary clustering model is built at the first phase; then, it is gradually updated by analyzing consecutive memory loads of points. Subsequently, at the end of scalable clustering, the approximate structure of the original clusters is obtained. Finally, by another scan of the entire dataset and using a suitable criterion, an outlying score is assigned to each object called SDCOR (Scalable Density-based Clustering Outlierness Ratio). Evaluations on real-life and synthetic datasets demonstrate that the proposed method has a low linear time…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
