# Improving the Accuracy of Principal Component Analysis by the Maximum   Entropy Method

**Authors:** Guihong Wan, Crystal Maung, Haim Schweitzer

arXiv: 1907.11094 · 2019-07-26

## TL;DR

This paper enhances classical PCA by applying the Maximum Entropy Method to better estimate distances between data points, leading to more accurate results in data approximation and nearest neighbor problems.

## Contribution

It introduces a novel approach combining PCA with the Maximum Entropy Method to improve distance estimation accuracy in data analysis.

## Key findings

- Improved distance estimates over classical PCA.
- Enhanced accuracy demonstrated through analysis and experiments.
- Method applicable to various data approximation tasks.

## Abstract

Classical Principal Component Analysis (PCA) approximates data in terms of projections on a small number of orthogonal vectors. There are simple procedures to efficiently compute various functions of the data from the PCA approximation. The most important function is arguably the Euclidean distance between data items, This can be used, for example, to solve the approximate nearest neighbor problem. We use random variables to model the inherent uncertainty in such approximations, and apply the Maximum Entropy Method to infer the underlying probability distribution. We propose using the expected values of distances between these random variables as improved estimates of the distance. We show by analysis and experimentally that in most cases results obtained by our method are more accurate than what is obtained by the classical approach. This improves the accuracy of a classical technique that have been used with little change for over 100 years.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.11094/full.md

## Figures

17 figures with captions in the complete paper: https://tomesphere.com/paper/1907.11094/full.md

## References

14 references — full list in the complete paper: https://tomesphere.com/paper/1907.11094/full.md

---
Source: https://tomesphere.com/paper/1907.11094