Information loss from dimensionality reduction in 5D-Gaussian spectral   data

A. Schelle; H. L\"uling

arXiv:2301.11923·physics.data-an·March 28, 2025

Information loss from dimensionality reduction in 5D-Gaussian spectral data

A. Schelle, H. L\"uling

PDF

Open Access

TL;DR

This paper analyzes how much information is lost when reducing 5D Gaussian spectral data to 2D, showing that the loss is minimal for small datasets, and explores entropy behavior with increasing sample size.

Contribution

It provides an elementary Shannon entropy analysis of spectral data reduction, revealing minimal information loss and entropy distribution behavior with sample size.

Findings

01

Less than 1% information loss in 2D projection for small datasets

02

Entropy distribution density increases with sample size

03

Entropy expectation value grows with larger sample sizes

Abstract

Understanding the loss of information in spectral analytics is a crucial first step towards finding root causes for failures and uncertainties using spectral data in artificial intelligence models built from modern complex data science applications. Here, we show from an elementary Shannon entropy model analysis with quantum statistics of Gaussian distributed spectral data, that the relative loss of information from dimensionality reduction due to the projection of an initial five-dimensional dataset onto two-dimensional diagrams is less than one percent in the parameter range of small data sets with sample sizes on the order of few hundred data samples. From our analysis, we also conclude that the density and expectation value of the entropy probability distribution increases with the sample number and sample size using artificial data models derived from random sampling Monte Carlo…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications