Uncharted Forest a Technique for Exploratory Data Analysis
Casey Kneale, Steven D. Brown

TL;DR
This paper introduces Uncharted Forest, an unsupervised tree ensemble method for exploratory data analysis, enabling visualization of class relationships, heterogeneity, and uninformative classes in high-dimensional datasets.
Contribution
The paper presents Uncharted Forest, a novel unsupervised tree ensemble technique for visualizing and analyzing complex high-dimensional data in provenance and classification studies.
Findings
Effective visualization of class associations and heterogeneity.
Comparison of new metrics with existing variance-based clustering methods.
Application to diverse datasets demonstrating utility and limitations.
Abstract
Exploratory data analysis is crucial for developing and understanding classification models from high-dimensional datasets. We explore the utility of a new unsupervised tree ensemble called uncharted forest for visualizing class associations, sample-sample associations, class heterogeneity, and uninformative classes for provenance studies. The uncharted forest algorithm can be used to partition data using random selections of variables and metrics based on statistical spread. After each tree is grown, a tally of the samples that arrive at every terminal node is maintained. Those tallies are stored in single sample association matrix and a likelihood measure for each sample being partitioned with one another can be made. That matrix may be readily viewed as a heat map, and the probabilities can be quantified via new metrics that account for class or cluster membership. We display the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
