Gower's similarity coefficients with automatic weight selection
Marcello D'Orazio

TL;DR
This paper introduces an automatic weight selection method for Gower's similarity coefficients, improving dissimilarity measurement in mixed-type variables for nearest-neighbor methods, with demonstrated benefits in classification and missing data imputation.
Contribution
It proposes a new weighting scheme for Gower's dissimilarity that minimizes correlation differences, enhancing variable contribution balance in mixed-type data.
Findings
Improved classification accuracy in simulations.
Enhanced missing value imputation performance.
Effective handling of mixed variable types.
Abstract
Nearest-neighbor methods have become popular in statistics and play a key role in statistical learning. Important decisions in nearest-neighbor methods concern the variables to use (when many potential candidates exist) and how to measure the dissimilarity between units. The first decision depends on the scope of the application while second depends mainly on the type of variables. Unfortunately, relatively few options permit to handle mixed-type variables, a situation frequently encountered in practical applications. The most popular dissimilarity for mixed-type variables is derived as the complement to one of the Gower's similarity coefficient. It is appealing because ranges between 0 and 1, being an average of the scaled dissimilarities calculated variable by variable, handles missing values and allows for a user-defined weighting scheme when averaging dissimilarities. The discussion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Criteria Decision Making · Face and Expression Recognition · Advanced Statistical Methods and Models
