Clustering of variables for enhanced interpretability of predictive models
Evelyne Vigneau

TL;DR
This paper introduces lmCLV, a new method combining variable clustering and boosting to create interpretable predictive models in high-dimensional, correlated data, demonstrated on simulated and real datasets.
Contribution
The paper presents lmCLV, a novel approach that enhances interpretability of predictive models by integrating variable clustering with boosting, applicable to high-dimensional datasets.
Findings
Comparable predictive accuracy to existing methods
Enhanced interpretability of models
Effective in high-dimensional, correlated data
Abstract
A new strategy is proposed for building easy to interpret predictive models in the context of a high-dimensional dataset, with a large number of highly correlated explanatory variables. The strategy is based on a first step of variables clustering using the CLustering of Variables around Latent Variables (CLV) method. The exploration of the hierarchical clustering dendrogram is undertaken in order to sequentially select the explanatory variables in a group-wise fashion. For model setting implementation, the dendrogram is used as the base-learner in an L2-boosting procedure. The proposed approach, named lmCLV, is illustrated on the basis of a toy-simulated example when the clusters and predictive equation are already known, and on a real case study dealing with the authentication of orange juices based on 1H-NMR spectroscopic analysis. In both illustrative examples, this procedure was…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMetabolomics and Mass Spectrometry Studies · Advanced Chemical Sensor Technologies · Neural Networks and Applications
