Two-sample test based on Self-Organizing Maps

Alejandro \'Alvarez-Ayll\'on; Manuel Palomo-Duarte; Juan-Manuel Dodero

arXiv:2212.08960·cs.LG·December 20, 2022

Two-sample test based on Self-Organizing Maps

Alejandro \'Alvarez-Ayll\'on, Manuel Palomo-Duarte, Juan-Manuel Dodero

PDF

Open Access

TL;DR

This paper proposes using Self-Organizing Maps as a two-sample test that not only detects differences between samples but also provides insights into how they differ, combining classification with interpretability.

Contribution

It introduces a novel approach leveraging Self-Organizing Maps for two-sample testing that offers both discrimination and interpretability.

Findings

01

SOM-based test can distinguish different populations effectively.

02

Provides insights into sample differences through visualization.

03

Combines classification accuracy with interpretability.

Abstract

Machine-learning classifiers can be leveraged as a two-sample statistical test. Suppose each sample is assigned a different label and that a classifier can obtain a better-than-chance result discriminating them. In this case, we can infer that both samples originate from different populations. However, many types of models, such as neural networks, behave as a black-box for the user: they can reject that both samples originate from the same population, but they do not offer insight into how both samples differ. Self-Organizing Maps are a dimensionality reduction initially devised as a data visualization tool that displays emergent properties, being also useful for classification tasks. Since they can be used as classifiers, they can be used also as a two-sample statistical test. But since their original purpose is visualization, they can also offer insights.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications