# Improving the Projection of Global Structures in Data through Spanning   Trees

**Authors:** Daniel Alcaide, and Jan Aerts

arXiv: 1907.05783 · 2019-07-15

## TL;DR

STAD is a novel dimensionality reduction technique that constructs a graph based on minimum spanning trees to accurately approximate high-dimensional data structures, enhancing data visualization and analysis.

## Contribution

This work introduces STAD, a new graph-based method that preserves high-dimensional data structures through spanning trees, allowing flexible and hypothesis-free data exploration.

## Key findings

- Successfully applied to traffic and air quality datasets
- Preserves original data distances more effectively than traditional methods
- Enables focused data analysis through customizable graph functions

## Abstract

The connection of edges in a graph generates a structure that is independent of a coordinate system. This visual metaphor allows creating a more flexible representation of data than a two-dimensional scatterplot. In this work, we present STAD (Spanning Trees as Approximation of Data), a dimensionality reduction method to approximate the high-dimensional structure into a graph with or without formulating prior hypotheses. STAD generates an abstract representation of high-dimensional data by giving each data point a location in a graph which preserves the distances in the original high-dimensional space. The STAD graph is built upon the Minimum Spanning Tree (MST) to which new edges are added until the correlation between the distances from the graph and the original dataset is maximized. Additionally, STAD supports the inclusion of additional functions to focus the exploration and allow the analysis of data from new perspectives, emphasizing traits in data which otherwise would remain hidden. We demonstrate the effectiveness of our method by applying it to two real-world datasets: traffic density in Barcelona and temporal measurements of air quality in Castile and Le\'on in Spain.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.05783/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/1907.05783/full.md

## References

58 references — full list in the complete paper: https://tomesphere.com/paper/1907.05783/full.md

---
Source: https://tomesphere.com/paper/1907.05783