Jacobian Aligned Random Forests
Sarwesh Rauniyar

TL;DR
JARF introduces a simple, gradient-based global feature space rotation to enhance axis-aligned forests, enabling them to better handle oblique decision boundaries with minimal added complexity.
Contribution
The paper presents JARF, a novel supervised preconditioning method that improves axis-aligned forests by capturing feature interactions through a global Jacobian-based rotation.
Findings
JARF consistently improves classification and regression accuracy.
It often matches or surpasses oblique forest baselines.
Training time is reduced compared to oblique methods.
Abstract
Axis-aligned decision trees are fast and stable but struggle on datasets with rotated or interaction-dependent decision boundaries, where informative splits require linear combinations of features rather than single-feature thresholds. Oblique forests address this with per-node hyperplane splits, but at added computational cost and implementation complexity. We propose a simple alternative: JARF, Jacobian-Aligned Random Forests. Concretely, we first fit an axis-aligned forest to estimate class probabilities or regression outputs, compute finite-difference gradients of these predictions with respect to each feature, aggregate them into an expected Jacobian outer product that generalizes the expected gradient outer product (EGOP), and use it as a single global linear preconditioner for all inputs. This supervised preconditioner applies a single global rotation of the feature space, then…
Peer Reviews
Decision·ICLR 2026 Poster
- The core idea is simple, clean, and easy to implement on top of existing RF code. - The method is well-motivated and clearly positioned between axis-aligned forests and oblique trees, leveraging prior EJOP work. - Experiments are solid: realistic baselines (RF, XGBoost, RotF, CCF, SPORF), multiple datasets, plus timing comparisons. - The mechanism analysis (alignment of oblique split normals with EJOP subspace) and ablations give good insight into why it works.
- The method heavily depends on the quality of probability estimates from the surrogate RF used to build EJOP, which is not deeply analyzed. - It only evaluates standard tabular classification datasets and does not explore regression or more challenging/high-dimensional settings. - There is no direct comparison to simpler global projections (e.g., PCA, LDA) used once before RF. - The novelty is mostly in combining known pieces (EJOP + RF + preconditioning) rather than introducing fundamentally
1. It is a simple and one-pass method, plugging the EJOP to perform initial feature transformation. This enables direct application on RF in the subsequent step.
1. The proposed method clearly lacks novelty, which does not match the conference standard. It is mainly based on a known paradigm EJOP. The paper’s main change is estimating EJOP with a surrogate RF and finite differences. This feels incremental relative to existing supervised/oblique projection lines rather than a new learning principle. 2. The estimator uses finite differences of RF class probabilities to approximate Jacobians (Sec. 3.6), but the analysis later assumes $f\in\mathcal C^3$ wit
- Interesting idea to transform the data in a supervised manner before training. - The method should be relatively fast to run, better on the compute-performance tradeoff than SPORF.
- The reported improvements in performance are **not particularly meaningful**. - Evaluating only on **10 real datasets** is not sufficient to claim generality or robustness. - There is **no discussion** on how to tune hyperparameters for the proposed method.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Advanced Graph Neural Networks · Big Data and Digital Economy
