A general framework for implementing distances for categorical variables
Michel van de Velden, Alfonso Iodice D'Enza, Angelos Markos and, Carlo Cavicchia

TL;DR
This paper presents a versatile framework for defining and implementing distances between categorical variables, enhancing classification and clustering methods by allowing flexible, data-specific, and potentially more effective distance measures.
Contribution
The authors introduce a general, efficient framework for categorical distances that unifies existing measures and facilitates the creation of new, tailored distance functions for various data analysis tasks.
Findings
Framework incorporates multiple existing distances.
Enables development of new, flexible distance measures.
Improves classification performance by integrating response-predictor associations.
Abstract
The degree to which subjects differ from each other with respect to certain properties measured by a set of variables, plays an important role in many statistical methods. For example, classification, clustering, and data visualization methods all require a quantification of differences in the observed values. We can refer to the quantification of such differences, as distance. An appropriate definition of a distance depends on the nature of the data and the problem at hand. For distances between numerical variables, there exist many definitions that depend on the size of the observed differences. For categorical data, the definition of a distance is more complex, as there is no straightforward quantification of the size of the observed differences. Consequently, many proposals exist that can be used to measure differences based on categorical variables. In this paper, we introduce a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Chemical Sensor Technologies · Sensory Analysis and Statistical Methods
