A Copula Based Supervised Filter for Feature Selection in Diabetes Risk Prediction Using Machine Learning

Agnideep Aich; Md Monzur Murshed; Sameera Hewage; Amanda Mayeaux

arXiv:2505.22554·stat.ML·March 5, 2026

A Copula Based Supervised Filter for Feature Selection in Diabetes Risk Prediction Using Machine Learning

Agnideep Aich, Md Monzur Murshed, Sameera Hewage, Amanda Mayeaux

PDF

TL;DR

This paper introduces a novel copula-based supervised filter for feature selection that emphasizes extreme value associations, improving interpretability and efficiency in diabetes risk prediction models.

Contribution

It proposes a new tail-focused feature ranking method using a Gumbel-copula implied concordance score, outperforming or matching standard methods in large-scale and clinical datasets.

Findings

01

Reduces features by approximately 52% on CDC dataset

02

Achieves the highest ROC-AUC on PIMA dataset

03

Provides a fast, interpretable feature screening method

Abstract

Effective feature selection is critical for robust and interpretable predictive modeling in medicine, especially when risk factors matter most in extreme patient strata. Many standard selectors emphasize average associations and can miss predictors whose relevance is concentrated in the distribution tails. We propose a computationally efficient supervised filter based on a Gumbel-copula implied upper-tail concordance score (lambda U), defined as a monotone transformation of Kendall's tau, to rank features by their tendency to be simultaneously extreme with the positive class. We compare against four common baselines (Mutual Information, mRMR, ReliefF, and L1/Elastic-Net) across four classifiers on two diabetes datasets: a large-scale public health survey (CDC, N=253,680) and a clinical benchmark (PIMA, N=768). Analyses include statistical testing, permutation importance, and robustness…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.