Nearest Neighbor based Clustering Algorithm for Large Data Sets
Pankaj Kumar Yadav, Sriniwas Pandey, Sraban Kumar Mohanty

TL;DR
This paper improves the I/O efficiency of clustering algorithms for large datasets by redesigning the Shared Near Neighbors algorithm within the external memory model, maintaining computational complexity while reducing disk access.
Contribution
It presents an external memory model implementation of the Shared Near Neighbors clustering algorithm, significantly reducing I/O complexity for large data sets.
Findings
Reduced I/O complexity in clustering large datasets
Maintained computational complexity despite I/O improvements
Validated performance with STXXL library implementation
Abstract
Clustering is an unsupervised learning technique in which data or objects are grouped into sets based on some similarity measure. Most of the clustering algorithms assume that the main memory is infinite and can accommodate the set of patterns. In reality many applications give rise to a large set of patterns which does not fit in the main memory. When the data set is too large, much of the data is stored in the secondary memory. Input/Outputs (I/O) from the disk are the major bottleneck in designing efficient clustering algorithms for large data sets. Different designing techniques have been used to design clustering algorithms for large data sets. External memory algorithms are one class of algorithms which can be used for large data sets. These algorithms exploit the hierarchical memory structure of the computers by incorporating locality of reference directly in the algorithm. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Data Mining Algorithms and Applications · Face and Expression Recognition
