Explicating feature contribution using Random Forest proximity distances

Leanne S. Whitmore; Anthe George; Corey M. Hudson

arXiv:1807.06572·stat.ML·July 18, 2018·1 cites

Explicating feature contribution using Random Forest proximity distances

Leanne S. Whitmore, Anthe George, Corey M. Hudson

PDF

Open Access

TL;DR

This paper introduces a method to quantify feature contributions in Random Forests by analyzing proximity distances, enabling better interpretability, auditing, and error analysis of black-box models.

Contribution

The paper presents a novel approach to explicate feature contributions in Random Forests using proximity distances, fully specifying the process for binary features.

Findings

01

Allows calculation of feature contributions in decision-making

02

Enables auditing and explanation of black-box decisions

03

Facilitates post-hoc error analysis

Abstract

In Random Forests, proximity distances are a metric representation of data into decision space. By observing how changes in input map to the movement of instances in this space we are able to determine the independent contribution of each feature to the decision-making process. For binary feature vectors, this process is fully specified. As these changes in input move particular instances nearer to the in-group or out-group, the independent contribution of each feature can be uncovered. Using this technique, we are able to calculate the contribution of each feature in determining how black-box decisions were made. This allows explication of the decision-making process, audit of the classifier, and post-hoc analysis of errors in classification.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Machine Learning and Data Classification · Imbalanced Data Classification Techniques