# Measuring the effects of confounders in medical supervised   classification problems: the Confounding Index (CI)

**Authors:** Elisa Ferrari, Alessandra Retico, Davide Bacciu

arXiv: 1905.08871 · 2020-02-05

## TL;DR

This paper introduces the Confounding Index (CI), a new metric to quantify the impact of confounders in biomedical classification tasks, aiding in bias detection and correction.

## Contribution

The novel Confounding Index (CI) measures confounding effects without prior knowledge of confounders, enabling better bias assessment and correction in biomedical data analysis.

## Key findings

- CI effectively detects confounders in simulated data.
- CI quantifies confounding effects in neuroimaging datasets.
- The method supports improved bias correction strategies.

## Abstract

Over the years, there has been growing interest in using Machine Learning techniques for biomedical data processing. When tackling these tasks, one needs to bear in mind that biomedical data depends on a variety of characteristics, such as demographic aspects (age, gender, etc) or the acquisition technology, which might be unrelated with the target of the analysis. In supervised tasks, failing to match the ground truth targets with respect to such characteristics, called confounders, may lead to very misleading estimates of the predictive performance. Many strategies have been proposed to handle confounders, ranging from data selection, to normalization techniques, up to the use of training algorithm for learning with imbalanced data. However, all these solutions require the confounders to be known a priori. To this aim, we introduce a novel index that is able to measure the confounding effect of a data attribute in a bias-agnostic way. This index can be used to quantitatively compare the confounding effects of different variables and to inform correction methods such as normalization procedures or ad-hoc-prepared learning algorithms. The effectiveness of this index is validated on both simulated data and real-world neuroimaging data.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.08871/full.md

## Figures

13 figures with captions in the complete paper: https://tomesphere.com/paper/1905.08871/full.md

## References

42 references — full list in the complete paper: https://tomesphere.com/paper/1905.08871/full.md

---
Source: https://tomesphere.com/paper/1905.08871