Autoencoding Random Forests

Binh Duc Vu; Jan Kapar; Marvin Wright; David S. Watson

arXiv:2505.21441·stat.ML·January 16, 2026

Autoencoding Random Forests

Binh Duc Vu, Jan Kapar, Marvin Wright, David S. Watson

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel autoencoding method for random forests that leverages spectral graph theory and nonparametric statistics to learn low-dimensional embeddings, enabling effective inversion of the model for visualization, compression, and denoising.

Contribution

It presents a new principled approach for autoencoding with random forests, including exact and approximate decoding techniques that are universally consistent and applicable to various data types.

Findings

01

Effective in visualizing high-dimensional data

02

Improves data compression and denoising capabilities

03

Applicable to tabular, image, and genomic datasets

Abstract

We propose a principled method for autoencoding with random forests. Our strategy builds on foundational results from nonparametric statistics and spectral graph theory to learn a low-dimensional embedding of the model that optimally represents relationships in the data. We provide exact and approximate solutions to the decoding problem via constrained optimization, split relabeling, and nearest neighbors regression. These methods effectively invert the compression pipeline, establishing a map from the embedding space back to the input space using splits learned by the ensemble's constituent trees. The resulting decoders are universally consistent under common regularity assumptions. The procedure works with supervised or unsupervised models, providing a window into conditional or joint distributions. We demonstrate various applications of this autoencoder, including powerful new tools…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Autoencoding Random Forests· slideslive

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Neural Networks and Applications · Machine Learning and Data Classification