# Inference of an explanatory variable from observations in a   high-dimensional space: application to high-resolution spectra of stars

**Authors:** V. Watson (UPS-OMP-CNRS-IRAP, Toulouse), JF. Trouilhet, (UPS-OMP-CNRS-IRAP, Toulouse), F. Paletou (UPS-OMP-CNRS-IRAP, Toulouse), S., Girard (MISTIS, INRIA Grenoble)

arXiv: 1706.02213 · 2017-06-08

## TL;DR

This paper evaluates methods for inferring stellar parameters from high-dimensional spectral data, demonstrating that supervised dimension reduction with selected directions significantly outperforms PCA in accuracy.

## Contribution

The study introduces a supervised dimension reduction approach using Sliced Inverse Regression with direction selection, improving parameter inference accuracy from high-dimensional stellar spectra.

## Key findings

- SIR with direction selection reduces inference error by up to 95%.
- Supervised methods outperform PCA in high-dimensional spectral analysis.
- Preliminary tests on synthetic data show promising results for the proposed approach.

## Abstract

Our aim is to evaluate fundamental parameters from the analysis of the electromagnetic spectra of stars. We may use $10^3$-$10^5$ spectra; each spectrum being a vector with $10^2$-$10^4$ coordinates. We thus face the so-called "curse of dimensionality". We look for a method to reduce the size of this data-space, keeping only the most relevant information.As a reference method, we use principal component analysis (PCA) to reduce dimensionality. However, PCA is an unsupervised method, therefore its subspace was not consistent with the parameter. We thus tested a supervised method based on Sliced Inverse Regression (SIR), which provides a subspace consistent with the parameter. It also shares analogies with factorial discriminant analysis: the method slices the database along the parameter variation, and builds the subspace which maximizes the inter-slice variance, while standardizing the total projected variance of the data. Nevertheless the performances of SIR were not satisfying in standard usage, because of the non-monotonicity of the unknown function linking the data to the parameter and because of the noise propagation. We show that better performances can be achieved by selecting the most relevant directions for parameter inference. Preliminary tests are performed on synthetic pseudo-line profiles plus noise. Using one direction, we show that compared to PCA, the error associated with SIR is 50$\%$ smaller on a non-linear parameter, and 70$\%$ smaler on a linear parameter. Moreover, using a selected direction, the error is 80$\%$ smaller for a non-linear parameter, and 95$\%$ smaller for a linear parameter.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1706.02213/full.md

## Figures

14 figures with captions in the complete paper: https://tomesphere.com/paper/1706.02213/full.md

## References

6 references — full list in the complete paper: https://tomesphere.com/paper/1706.02213/full.md

---
Source: https://tomesphere.com/paper/1706.02213