# Filtering out mislabeled training instances using black-box optimization and quantum annealing

**Authors:** Makoto Otsuka, Kento Kodama, Keisuke Morita, Masayuki Ohzeki

PMC · DOI: 10.1038/s41598-025-21686-z · Scientific Reports · 2025-10-29

## TL;DR

This paper introduces a method to remove mislabeled data from training sets using black-box optimization and quantum annealing, improving model performance.

## Contribution

A novel integration of surrogate model-based BBO and quantum annealing for efficient mislabel filtering in datasets.

## Key findings

- The method effectively prioritizes removal of high-risk mislabeled instances in a noisy majority bit task.
- Using D-Wave’s physical quantum annealer improves optimization speed and training subset quality over simulated approaches.
- The proposed framework is scalable and applicable to supervised learning tasks.

## Abstract

This study proposes an approach for removing mislabeled instances from contaminated training datasets by combining surrogate model-based black-box optimization (BBO) with postprocessing and quantum annealing. Mislabeled training instances, a common issue in real-world datasets, often degrade model generalization, necessitating robust and efficient noise-removal strategies. The proposed method evaluates filtered training subsets based on validation loss, iteratively refines loss estimates through surrogate model-based BBO with postprocessing, and leverages quantum annealing to efficiently sample diverse training subsets with low validation error. Experiments on a noisy majority bit task demonstrate the method’s ability to prioritize the removal of high-risk mislabeled instances. Integrating D-Wave’s clique sampler running on a physical quantum annealer achieves faster optimization and higher-quality training subsets compared to OpenJij’s simulated quantum annealing sampler or Neal’s simulated annealing sampler, offering a scalable framework for enhancing dataset quality. This work highlights the effectiveness of the proposed method for supervised learning tasks, with future directions including its application to unsupervised learning, real-world datasets, and large-scale implementations.

## Full-text entities

- **Chemicals:** N (MESH:D009584)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Mutations:** -1049A-A

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12572169/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12572169/full.md

## References

25 references — full list in the complete paper: https://tomesphere.com/paper/PMC12572169/full.md

---
Source: https://tomesphere.com/paper/PMC12572169