# Measuring the prediction difficulty of individual cases in a dataset using machine learning

**Authors:** Hyunjin Kwon, Matthew Greenberg, Colin Bruce Josephson, Joon Lee

PMC · DOI: 10.1038/s41598-024-61284-z · Scientific Reports · 2024-05-07

## TL;DR

This paper introduces three new metrics to measure how hard it is for machine learning models to predict individual cases in a dataset.

## Contribution

The paper proposes three novel metrics for assessing prediction difficulty using neural networks.

## Key findings

- The proposed metrics outperformed most existing metrics in differentiating prediction difficulty levels.
- The metrics showed consistent effectiveness across diverse datasets.
- The metrics provide a new perspective for understanding datasets and improving machine learning applications.

## Abstract

Different levels of prediction difficulty are one of the key factors that researchers encounter when applying machine learning to data. Although previous studies have introduced various metrics for assessing the prediction difficulty of individual cases, these metrics require specific dataset preconditions. In this paper, we propose three novel metrics for measuring the prediction difficulty of individual cases using fully-connected feedforward neural networks. The first metric is based on the complexity of the neural network needed to make a correct prediction. The second metric employs a pair of neural networks: one makes a prediction for a given case, and the other predicts whether the prediction made by the first model is likely to be correct. The third metric assesses the variability of the neural network’s predictions. We investigated these metrics using a variety of datasets, visualized their values, and compared them to fifteen existing metrics from the literature. The results demonstrate that the proposed case difficulty metrics were better able to differentiate various levels of difficulty than most of the existing metrics and show constant effectiveness across diverse datasets. We expect our metrics will provide researchers with a new perspective on understanding their datasets and applying machine learning in various fields.

## Full-text entities

- **Diseases:** MNN (MESH:D009410), TD (MESH:D004409), Breast Cancer (MESH:D001943), CL (MESH:D002971)
- **Chemicals:** DCP (MESH:C580746), CDmc (-)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11076552/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11076552/full.md

## References

21 references — full list in the complete paper: https://tomesphere.com/paper/PMC11076552/full.md

---
Source: https://tomesphere.com/paper/PMC11076552