Out-of-Core Dimensionality Reduction for Large Data via Out-of-Sample Extensions
Luca Reichmann, David H\"agele, Daniel Weiskopf

TL;DR
This paper introduces an out-of-sample extension method for out-of-core dimensionality reduction, enabling visualization of large datasets by iteratively projecting data into manageable reference sets, and evaluates its effectiveness across multiple algorithms.
Contribution
It presents a novel out-of-sample extension for metric MDS and evaluates its performance with five DR algorithms on large-scale data, including a billion-instance use case.
Findings
Out-of-sample extensions enable DR on large datasets.
Trade-offs between reference set size and projection quality are characterized.
The approach outperforms some recent DR methods in handling large data.
Abstract
Dimensionality reduction (DR) is a well-established approach for the visualization of high-dimensional data sets. While DR methods are often applied to typical DR benchmark data sets in the literature, they might suffer from high runtime complexity and memory requirements, making them unsuitable for large data visualization especially in environments outside of high-performance computing. To perform DR on large data sets, we propose the use of out-of-sample extensions. Such extensions allow inserting new data into existing projections, which we leverage to iteratively project data into a reference projection that consists only of a small manageable subset. This process makes it possible to perform DR out-of-core on large data, which would otherwise not be possible due to memory and runtime limitations. For metric multidimensional scaling (MDS), we contribute an implementation with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in cancer detection · Statistical Methods and Inference · Medical Image Segmentation Techniques
MethodsSparse Evolutionary Training · Principal Components Analysis
