Using Markov Boundary Approach for Interpretable and Generalizable   Feature Selection

Anwesha Bhattacharyya; Yaqun Wang; Joel Vaughan; and Vijayan N. Nair

arXiv:2307.14327·stat.AP·March 11, 2025

Using Markov Boundary Approach for Interpretable and Generalizable Feature Selection

Anwesha Bhattacharyya, Yaqun Wang, Joel Vaughan, and Vijayan N. Nair

PDF

Open Access

TL;DR

This paper introduces a multi-group forward-backward selection method for identifying Markov boundaries in complex data, enhancing feature selection for more interpretable and generalizable machine learning models.

Contribution

It proposes a novel strategy to accurately identify Markov boundaries in non-linear and mixed data types, addressing limitations of existing methods.

Findings

01

Effective in simulated datasets

02

Demonstrates improved feature selection accuracy

03

Applicable to real-world datasets

Abstract

The perceived advantage of machine learning (ML) models is that they are flexible and can incorporate a large number of features. However, many of these are typically correlated or dependent, and incorporating all of them can hinder model stability and generalizability. In fact, it is desirable to do some form of feature screening and incorporate only the relevant features. The best approaches should involve subject-matter knowledge and information on causal relationships. This paper deals with an approach called Markov boundary (MB) that is related to causal discovery, using directed acyclic graphs to represent potential relationships and using statistical tests to determine the connections. An MB is the minimum set of features that guarantee that other potential predictors do not affect the target given the boundary while ensuring maximal predictive accuracy. Identifying the Markov…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Modeling and Causal Inference · Neural Networks and Applications · Fault Detection and Control Systems