Towards Adversarial Robustness via Debiased High-Confidence Logit Alignment

Kejia Zhang; Juanjuan Weng; Shaozi Li; Zhiming Luo

arXiv:2408.06079·cs.CV·July 10, 2025

Towards Adversarial Robustness via Debiased High-Confidence Logit Alignment

Kejia Zhang, Juanjuan Weng, Shaozi Li, Zhiming Luo

PDF

Open Access

TL;DR

This paper introduces DHAT, a novel adversarial training method that reduces background feature bias in DNNs, significantly improving adversarial robustness and generalization on CIFAR and ImageNet-1K benchmarks.

Contribution

The paper proposes a debiased high-confidence logit alignment method (DHAT) to mitigate background feature bias in adversarial training, enhancing robustness and generalization.

Findings

01

DHAT achieves state-of-the-art robustness on CIFAR and ImageNet-1K.

02

DHAT significantly reduces reliance on background features during training.

03

DHAT improves model generalization by mitigating feature bias.

Abstract

Despite the remarkable progress of deep neural networks (DNNs) in various visual tasks, their vulnerability to adversarial examples raises significant security concerns. Recent adversarial training methods leverage inverse adversarial attacks to generate high-confidence examples, aiming to align adversarial distributions with high-confidence class regions. However, our investigation reveals that under inverse adversarial attacks, high-confidence outputs are influenced by biased feature activations, causing models to rely on background features that lack a causal relationship with the labels. This spurious correlation bias leads to overfitting irrelevant background features during adversarial training, thereby degrading the model's robust performance and generalization capabilities. To address this issue, we propose Debiased High-Confidence Adversarial Training (DHAT), a novel approach…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Forensic and Genetic Research

MethodsSoftmax · Attention Is All You Need · ALIGN