# Wasserstein Projection Pursuit of Non-Gaussian Signals

**Authors:** Satyaki Mukherjee, Soumendu Sundar Mukherjee, Debarghya Ghoshdastidar

arXiv: 2302.12693 · 2023-02-27

## TL;DR

This paper introduces a projection pursuit method using Wasserstein distance to identify non-Gaussian subspaces in high-dimensional data, providing theoretical guarantees even when data dimensionality is comparable to sample size.

## Contribution

It develops a rigorous statistical framework for Wasserstein-based projection pursuit in high dimensions with guarantees under a generative model.

## Key findings

- Provides bounds on the accuracy of subspace estimation.
- Operates effectively when data dimensionality is similar to sample size.
- Addresses limitations of traditional projection pursuit in high-dimensional regimes.

## Abstract

We consider the general dimensionality reduction problem of locating in a high-dimensional data cloud, a $k$-dimensional non-Gaussian subspace of interesting features. We use a projection pursuit approach -- we search for mutually orthogonal unit directions which maximise the 2-Wasserstein distance of the empirical distribution of data-projections along these directions from a standard Gaussian. Under a generative model, where there is a underlying (unknown) low-dimensional non-Gaussian subspace, we prove rigorous statistical guarantees on the accuracy of approximating this unknown subspace by the directions found by our projection pursuit approach. Our results operate in the regime where the data dimensionality is comparable to the sample size, and thus supplement the recent literature on the non-feasibility of locating interesting directions via projection pursuit in the complementary regime where the data dimensionality is much larger than the sample size.

---
Source: https://tomesphere.com/paper/2302.12693