# Robust Deep Active Learning via Distance-Measured Data Mixing and Adversarial Training

**Authors:** Shinan Song, Xing Wang, Shike Dong, Jingyan Jiang

PMC · DOI: 10.3390/e27111159 · Entropy · 2025-11-14

## TL;DR

This paper introduces a new deep active learning framework that improves sample selection by combining distance-based uncertainty estimation with adversarial training to enhance model robustness and performance.

## Contribution

The novel Distance-Measured Data Mixing (DM2) framework and boundary-aware adversarial training method for robust active learning.

## Key findings

- DM2 outperforms uncertainty- and diversity-based baselines across multiple tasks and data types.
- The adversarial training technique improves model robustness in noisy and imbalanced data scenarios.
- The approach reduces the number of labeled samples needed for effective learning.

## Abstract

Accurate uncertainty estimation in unlabeled data represents a fundamental challenge in active learning. Traditional deep active learning approaches suffer from a critical limitation: uncertainty-based selection strategies tend to concentrate excessively around noisy decision boundaries, while diversity-based methods may miss samples that are crucial for decision-making. This over-reliance on confidence metrics when employing deep neural networks as backbone architectures often results in suboptimal data selection. We introduce Distance-Measured Data Mixing (DM2), a novel framework that estimates sample uncertainty through distance-weighted data mixing to capture inter-sample relationships and the underlying data manifold structure. This approach enables informative sample selection across the entire data distribution while maintaining focus on near-boundary regions without overfitting to the most ambiguous instances. To address noise and instability issues inherent in boundary regions, we propose a boundary-aware feature fusion mechanism integrated with fast gradient adversarial training. This technique generates adversarial counterparts of selected near-boundary samples and trains them jointly with the original instances, thereby enhancing model robustness and generalization capabilities under complex or imbalanced data conditions. Comprehensive experiments across diverse tasks, model architectures, and data modalities demonstrate that our approach consistently surpasses strong uncertainty-based and diversity-based baselines while significantly reducing the number of labeled samples required for effective learning.

## Full-text entities

- **Genes:** IGHD1-14 (immunoglobulin heavy diversity 1-14 (non-functional)) [NCBI Gene 28508] {aka DM2, IGHD114}
- **Diseases:** injury to (MESH:D014947), L (MESH:D007926)
- **Chemicals:** L (MESH:D007930)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12651851/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12651851/full.md

## References

44 references — full list in the complete paper: https://tomesphere.com/paper/PMC12651851/full.md

---
Source: https://tomesphere.com/paper/PMC12651851