Two-sample goodness-of-fit tests on the flat torus based on Wasserstein distance and their relevance to structural biology
Javier Gonz\'alez-Delgado, Alberto Gonz\'alez-Sanz, Juan Cort\'es and, Pierre Neuvial

TL;DR
This paper develops Wasserstein distance-based two-sample goodness-of-fit tests on the flat torus, extending optimal transport theory and applying them to analyze local protein structures in biology.
Contribution
It introduces new Wasserstein-based testing methods on the flat torus, extending optimal transport theory with a Central Limit Theorem, and demonstrates their application to protein structure data.
Findings
Validated the proposed tests through numerical experiments.
Proved the tests' validity and consistency.
Applied tests to real protein data.
Abstract
This work is motivated by the study of local protein structure, which is defined by two variable dihedral angles that take values from probability distributions on the flat torus. Our goal is to provide the space with a metric that quantifies local structural modifications due to changes in the protein sequence, and to define associated two-sample goodness-of-fit testing approaches. Due to its adaptability to the space geometry, we focus on the Wasserstein distance as a metric between distributions. We extend existing results of the theory of Optimal Transport to the -dimensional flat torus , in particular a Central Limit Theorem. Moreover, we propose different approaches for two-sample goodness-of-fit testing for the one and two-dimensional case, based on the Wasserstein distance. We prove their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMarkov Chains and Monte Carlo Methods · Random Matrices and Applications · Statistical Methods and Inference
