Bootstrap Confidence Regions for Learned Feature Embeddings
Kris Sankaran

TL;DR
This paper develops bootstrap-based methods to quantify uncertainty in low-dimensional projections of learned feature embeddings from high-dimensional non-matrix data, aiding interpretability.
Contribution
It adapts bootstrap techniques for PCA to learned feature embeddings, providing a new way to assess uncertainty in these projections.
Findings
Bootstrap confidence regions are effective in simulations.
Methods are applicable to spatial proteomic data.
Code and data are publicly available.
Abstract
Algorithmic feature learners provide high-dimensional vector representations for non-matrix structured signals, like images, audio, text, and graphs. Low-dimensional projections derived from these representations can be used to explore variation across collections of these data. However, it is not clear how to assess the uncertainty associated with these projections. We adapt methods developed for bootstrapping principal components analysis to the setting where features are learned from non-matrix data. We empirically compare the derived confidence regions in simulations, varying factors that influence both feature learning and the bootstrap. Approaches are illustrated on spatial proteomic data. Code, data, and trained models are released as an R compendium.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Analysis with R · Gene expression and cancer classification
