# Estimating the effective dimension of large biological datasets using   Fisher separability analysis

**Authors:** Luca Albergante, Jonathan Bac, Andrei Zinovyev

arXiv: 1901.06328 · 2019-01-21

## TL;DR

This paper evaluates a Fisher separability-based method for estimating the intrinsic dimension of large biological datasets, demonstrating its efficiency and robustness, especially in noisy conditions and non-manifold data.

## Contribution

It introduces a competitive and efficient intrinsic dimension estimator based on data separability, effective across various datasets and noise levels.

## Key findings

- Performs comparably to state-of-the-art measures
- More robust in noisy data scenarios
- Effective even when manifold assumptions do not hold

## Abstract

Modern large-scale datasets are frequently said to be high-dimensional. However, their data point clouds frequently possess structures, significantly decreasing their intrinsic dimensionality (ID) due to the presence of clusters, points being located close to low-dimensional varieties or fine-grained lumping. We test a recently introduced dimensionality estimator, based on analysing the separability properties of data points, on several benchmarks and real biological datasets. We show that the introduced measure of ID has performance competitive with state-of-the-art measures, being efficient across a wide range of dimensions and performing better in the case of noisy samples. Moreover, it allows estimating the intrinsic dimension in situations where the intrinsic manifold assumption is not valid.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1901.06328/full.md

## Figures

15 figures with captions in the complete paper: https://tomesphere.com/paper/1901.06328/full.md

## References

34 references — full list in the complete paper: https://tomesphere.com/paper/1901.06328/full.md

---
Source: https://tomesphere.com/paper/1901.06328