RobustFormer: Noise-Robust Pre-training for images and videos

Ashish Bastola; Nishant Luitel; Hao Wang; Danda Pani Paudel; Roshani Poudel; Abolfazl Razi

arXiv:2411.13040·cs.CV·January 12, 2026

RobustFormer: Noise-Robust Pre-training for images and videos

Ashish Bastola, Nishant Luitel, Hao Wang, Danda Pani Paudel, Roshani Poudel, Abolfazl Razi

PDF

Open Access

TL;DR

RobustFormer introduces a wavelet transform-based pre-training framework for images and videos that enhances noise robustness and reduces computational complexity, outperforming baseline models under noisy conditions.

Contribution

It is the first DWT-based method compatible with video inputs and MAE pre-training, eliminating the need for inverse transforms and focusing on multi-scale noise-resilient features.

Findings

01

Up to 8% accuracy improvement on noisy ImageNet-C datasets.

02

Up to 2.7% accuracy gain on Imagenet-P benchmarks.

03

Up to 13% higher accuracy on UCF-101 under severe noise.

Abstract

While deep learning-based models like transformers, have revolutionized time-series and vision tasks, they remain highly susceptible to noise and often overfit on noisy patterns rather than robust features. This issue is exacerbated in vision transformers, which rely on pixel-level details that can easily be corrupt. To address this, we leverage the discrete wavelet transform (DWT) for its ability to decompose into multi-resolution layers, isolating noise primarily in the high frequency domain while preserving essential low-frequency information for resilient feature learning. Conventional DWT-based methods, however, struggle with computational inefficiencies due to the requirement for a subsequent inverse discrete wavelet transform (IDWT) step. In this work, we introduce RobustFormer, a novel framework that enables noise-robust masked autoencoder (MAE) pre-training for both images and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Signal Denoising Methods · Neural Networks and Applications · Model Reduction and Neural Networks

MethodsSoftmax · Attention Is All You Need