# On the Sample Complexity of HGR Maximal Correlation Functions for Large   Datasets

**Authors:** Shao-Lun Huang, Xiangxiang Xu

arXiv: 1907.00393 · 2021-09-15

## TL;DR

This paper analyzes the sample complexity of estimating HGR maximal correlation functions using the ACE algorithm on large datasets, providing theoretical bounds, optimal sampling strategies, and supporting simulations.

## Contribution

It develops a mathematical framework for understanding learning errors and error exponents in estimating HGR functions, and proposes an optimal sampling strategy for semi-supervised learning.

## Key findings

- Derived analytical expressions for error exponents.
- Established upper bounds for sample complexity.
- Proposed an optimal sampling strategy to maximize error exponents.

## Abstract

The Hirschfeld-Gebelein-R\'{e}nyi (HGR) maximal correlation and the corresponding functions have been shown useful in many machine learning scenarios. In this paper, we study the sample complexity of estimating the HGR maximal correlation functions by the alternating conditional expectation (ACE) algorithm using training samples from large datasets. Specifically, we develop a mathematical framework to characterize the learning errors between the maximal correlation functions computed from the true distribution, and the functions estimated from the ACE algorithm. For both supervised and semi-supervised learning scenarios, we establish the analytical expressions for the error exponents of the learning errors. Furthermore, we demonstrate that for large datasets, the upper bounds for the sample complexity of learning the HGR maximal correlation functions by the ACE algorithm can be expressed using the established error exponents. Moreover, with our theoretical results, we investigate the sampling strategy for different types of samples in semi-supervised learning with a total sampling budget constraint, and an optimal sampling strategy is developed to maximize the error exponent of the learning error. Finally, the numerical simulations are presented to support our theoretical results.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.00393/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/1907.00393/full.md

## References

28 references — full list in the complete paper: https://tomesphere.com/paper/1907.00393/full.md

---
Source: https://tomesphere.com/paper/1907.00393