Classifying token frequencies using angular Minkowski $p$-distance

Oliver Urs Lenz; Chris Cornelis

arXiv:2309.14495·cs.LG·September 27, 2023

Classifying token frequencies using angular Minkowski $p$-distance

Oliver Urs Lenz, Chris Cornelis

PDF

Open Access

TL;DR

This paper explores the use of angular Minkowski p-distance for classifying token frequency data, demonstrating it can outperform cosine dissimilarity with proper parameter tuning.

Contribution

It introduces angular Minkowski p-distance as an alternative to cosine dissimilarity for token frequency classification and evaluates its effectiveness on the 20-newsgroups dataset.

Findings

01

Higher classification accuracy with suitable p values

02

Performance depends on hyperparameters like p, k, and weights

03

Angular Minkowski p-distance can outperform cosine dissimilarity

Abstract

Angular Minkowski $p$ -distance is a dissimilarity measure that is obtained by replacing Euclidean distance in the definition of cosine dissimilarity with other Minkowski $p$ -distances. Cosine dissimilarity is frequently used with datasets containing token frequencies, and angular Minkowski $p$ -distance may potentially be an even better choice for certain tasks. In a case study based on the 20-newsgroups dataset, we evaluate clasification performance for classical weighted nearest neighbours, as well as fuzzy rough nearest neighbours. In addition, we analyse the relationship between the hyperparameter $p$ , the dimensionality $m$ of the dataset, the number of neighbours $k$ , the choice of weights and the choice of classifier. We conclude that it is possible to obtain substantially higher classification performance with angular Minkowski $p$ -distance with suitable values for $p$ than with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMorphological variations and asymmetry · Topological and Geometric Data Analysis · Image Processing and 3D Reconstruction