Classifying token frequencies using angular Minkowski $p$-distance
Oliver Urs Lenz, Chris Cornelis

TL;DR
This paper explores the use of angular Minkowski p-distance for classifying token frequency data, demonstrating it can outperform cosine dissimilarity with proper parameter tuning.
Contribution
It introduces angular Minkowski p-distance as an alternative to cosine dissimilarity for token frequency classification and evaluates its effectiveness on the 20-newsgroups dataset.
Findings
Higher classification accuracy with suitable p values
Performance depends on hyperparameters like p, k, and weights
Angular Minkowski p-distance can outperform cosine dissimilarity
Abstract
Angular Minkowski -distance is a dissimilarity measure that is obtained by replacing Euclidean distance in the definition of cosine dissimilarity with other Minkowski -distances. Cosine dissimilarity is frequently used with datasets containing token frequencies, and angular Minkowski -distance may potentially be an even better choice for certain tasks. In a case study based on the 20-newsgroups dataset, we evaluate clasification performance for classical weighted nearest neighbours, as well as fuzzy rough nearest neighbours. In addition, we analyse the relationship between the hyperparameter , the dimensionality of the dataset, the number of neighbours , the choice of weights and the choice of classifier. We conclude that it is possible to obtain substantially higher classification performance with angular Minkowski -distance with suitable values for than with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMorphological variations and asymmetry · Topological and Geometric Data Analysis · Image Processing and 3D Reconstruction
