A Unified Data Representation Learning for Non-parametric Two-sample   Testing

Xunye Tian; Liuhua Peng; Zhijian Zhou; Mingming Gong; Arthur Gretton,; Feng Liu

arXiv:2412.00613·cs.LG·May 9, 2025

A Unified Data Representation Learning for Non-parametric Two-sample Testing

Xunye Tian, Liuhua Peng, Zhijian Zhou, Mingming Gong, Arthur Gretton,, Feng Liu

PDF

Open Access

TL;DR

This paper introduces RL-TST, a novel framework for non-parametric two-sample testing that leverages entire datasets for representation learning without compromising error control, improving test power.

Contribution

It proposes a unified self-supervised and discriminative representation learning framework that utilizes the full dataset for more effective two-sample testing.

Findings

01

RL-TST outperforms existing methods in experiments.

02

Utilizes both data manifold and discriminative features.

03

Enhances test power while controlling Type-I errors.

Abstract

Learning effective data representations has been crucial in non-parametric two-sample testing. Common approaches will first split data into training and test sets and then learn data representations purely on the training set. However, recent theoretical studies have shown that, as long as the sample indexes are not used during the learning process, the whole data can be used to learn data representations, meanwhile ensuring control of Type-I errors. The above fact motivates us to use the test set (but without sample indexes) to facilitate the data representation learning in the testing. To this end, we propose a representation-learning two-sample testing (RL-TST) framework. RL-TST first performs purely self-supervised representation learning on the entire dataset to capture inherent representations (IRs) that reflect the underlying data manifold. A discriminative model is then trained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace and Expression Recognition · Advanced Statistical Methods and Models · Fault Detection and Control Systems

MethodsSparse Evolutionary Training