IPD:An Incremental Prototype based DBSCAN for large-scale data with   cluster representatives

Jayasree Saha; Jayanta Mukherjee

arXiv:2202.07870·cs.LG·October 12, 2023

IPD:An Incremental Prototype based DBSCAN for large-scale data with cluster representatives

Jayasree Saha, Jayanta Mukherjee

PDF

Open Access

TL;DR

This paper introduces IPD, an incremental prototype-based DBSCAN algorithm designed for large-scale data, which efficiently identifies arbitrary-shaped clusters and selects representatives for improved querying.

Contribution

The paper presents a novel incremental clustering method that combines density-based clustering with prototype selection for large datasets.

Findings

01

Effectively handles large-scale data clustering.

02

Identifies arbitrary-shaped clusters.

03

Selects representative points for each cluster.

Abstract

DBSCAN is a fundamental density-based clustering technique that identifies any arbitrary shape of the clusters. However, it becomes infeasible while handling big data. On the other hand, centroid-based clustering is important for detecting patterns in a dataset since unprocessed data points can be labeled to their nearest centroid. However, it can not detect non-spherical clusters. For a large data, it is not feasible to store and compute labels of every samples. These can be done as and when the information is required. The purpose can be accomplished when clustering act as a tool to identify cluster representatives and query is served by assigning cluster labels of nearest representative. In this paper, we propose an Incremental Prototype-based DBSCAN (IPD) algorithm which is designed to identify arbitrary-shaped clusters for large-scale data. Additionally, it chooses a set of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Data Management and Algorithms · Data Mining Algorithms and Applications