Clustering of variables for enhanced interpretability of predictive   models

Evelyne Vigneau

arXiv:2008.07924·stat.AP·July 14, 2023·Informatica

Clustering of variables for enhanced interpretability of predictive models

Evelyne Vigneau

PDF

Open Access

TL;DR

This paper introduces lmCLV, a new method combining variable clustering and boosting to create interpretable predictive models in high-dimensional, correlated data, demonstrated on simulated and real datasets.

Contribution

The paper presents lmCLV, a novel approach that enhances interpretability of predictive models by integrating variable clustering with boosting, applicable to high-dimensional datasets.

Findings

01

Comparable predictive accuracy to existing methods

02

Enhanced interpretability of models

03

Effective in high-dimensional, correlated data

Abstract

A new strategy is proposed for building easy to interpret predictive models in the context of a high-dimensional dataset, with a large number of highly correlated explanatory variables. The strategy is based on a first step of variables clustering using the CLustering of Variables around Latent Variables (CLV) method. The exploration of the hierarchical clustering dendrogram is undertaken in order to sequentially select the explanatory variables in a group-wise fashion. For model setting implementation, the dendrogram is used as the base-learner in an L2-boosting procedure. The proposed approach, named lmCLV, is illustrated on the basis of a toy-simulated example when the clusters and predictive equation are already known, and on a real case study dealing with the authentication of orange juices based on 1H-NMR spectroscopic analysis. In both illustrative examples, this procedure was…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMetabolomics and Mass Spectrometry Studies · Advanced Chemical Sensor Technologies · Neural Networks and Applications