Density based Spatial Clustering of Lines via Probabilistic Generation of Neighbourhood
Akanksha Das, Malay Bhattacharyya

TL;DR
This paper introduces a novel density-based clustering algorithm for lines in high-dimensional spaces that handles outliers, missing data, and domain knowledge, with applications demonstrated on synthetic and real-world datasets.
Contribution
It generalizes density-based clustering to lines in high-dimensional spaces using a probabilistic neighborhood approach, addressing the lack of a valid distance measure for lines.
Findings
Effective noise and outlier detection in clustering
Ability to cluster incomplete high-dimensional data
Successful application to real-world datasets like rail and road networks
Abstract
Density based spatial clustering of points in has a myriad of applications in a variety of industries. We generalise this problem to the density based clustering of lines in high-dimensional spaces, keeping in mind there exists no valid distance measure that follows the triangle inequality for lines. In this paper, we design a clustering algorithm that generates a customised neighbourhood for a line of a fixed volume (given as a parameter), based on an optional parameter as a continuous probability density function. This algorithm is not sensitive to the outliers and can effectively identify the noise in the data using a cardinality parameter. One of the pivotal applications of this algorithm is clustering data points in with missing entries, while utilising the domain knowledge of the respective data. In particular, the proposed algorithm is able to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research
