Explainable AI for Data-Driven Design of High-Dimensional Predictive Studies

Junyu Yan; Damian Machlanski; Kurt Butler; Panagiotis Dimitrakopoulos; Ewen M Harrison; Bruce Guthrie; Sotirios A Tsaftaris

arXiv:2605.22243·cs.LG·May 22, 2026

Explainable AI for Data-Driven Design of High-Dimensional Predictive Studies

Junyu Yan, Damian Machlanski, Kurt Butler, Panagiotis Dimitrakopoulos, Ewen M Harrison, Bruce Guthrie, Sotirios A Tsaftaris

PDF

TL;DR

This paper introduces an explainable AI framework that provides data-driven recommendations to enhance high-dimensional predictive models, improving performance and interpretability in health data analysis.

Contribution

The study develops an AI-based recommender system that suggests feature modifications to optimize interpretable predictive models, validated on clinical and public datasets.

Findings

01

Improved C-index from 0.805 to 0.815 in a clinical dataset.

02

Recommended excluding 23 features and adding 221 interactions.

03

Effective across multiple datasets, demonstrating broad applicability.

Abstract

Predictive modelling is important for health data analysis and data-driven clinical decision-making. However, predictive studies are challenging to design optimally by hand when tens or even hundreds of features require selection, transformation, or interaction modelling. While complex machine learning models offer high performance, their "black-box" nature limits the clinical trust, transparency, and interpretability required for decision-making. We developed and evaluated an Exploratory AI Recommender that provides data-driven recommendations to improve predictive performance of existing interpretable statistical models. The developed framework uses flexible AI modelling to capture complex data patterns and explainable AI techniques to translate the patterns into three recommendation types: feature exclusion, non-linear terms, and feature interactions. We evaluated the framework by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.