Balancing Accuracy, Calibration, and Efficiency in Active Learning with   Vision Transformers Under Label Noise

Moseli Mots'oehli; Hope Mogale; Kyungim Baek

arXiv:2505.04375·cs.CV·May 8, 2025

Balancing Accuracy, Calibration, and Efficiency in Active Learning with Vision Transformers Under Label Noise

Moseli Mots'oehli, Hope Mogale, Kyungim Baek

PDF

Open Access

TL;DR

This paper evaluates the performance of various vision transformer models under label noise in active learning scenarios, highlighting the robustness of larger models and the impact of patch size on accuracy and calibration.

Contribution

It provides a comprehensive comparison of vision transformer configurations under label noise, offering practical insights for deploying efficient models in noisy, resource-limited settings.

Findings

01

Larger ViT models outperform smaller ones in accuracy and calibration under noise.

02

Swin Transformers show weaker robustness to label noise across configurations.

03

Smaller patch sizes do not always improve performance and increase computational costs.

Abstract

Fine-tuning pre-trained convolutional neural networks on ImageNet for downstream tasks is well-established. Still, the impact of model size on the performance of vision transformers in similar scenarios, particularly under label noise, remains largely unexplored. Given the utility and versatility of transformer architectures, this study investigates their practicality under low-budget constraints and noisy labels. We explore how classification accuracy and calibration are affected by symmetric label noise in active learning settings, evaluating four vision transformer configurations (Base and Large with 16x16 and 32x32 patch sizes) and three Swin Transformer configurations (Tiny, Small, and Base) on CIFAR10 and CIFAR100 datasets, under varying label noise rates. Our findings show that larger ViT models (ViTl32 in particular) consistently outperform their smaller counterparts in both…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Machine Learning and Algorithms · Explainable Artificial Intelligence (XAI)

MethodsAttention Is All You Need · Linear Layer · Stochastic Depth · Multi-Head Attention · Dense Connections · Adam · Swin Transformer · Dropout · Vision Transformer · Layer Normalization