RobustFormer: Noise-Robust Pre-training for images and videos
Ashish Bastola, Nishant Luitel, Hao Wang, Danda Pani Paudel, Roshani Poudel, Abolfazl Razi

TL;DR
RobustFormer introduces a wavelet transform-based pre-training framework for images and videos that enhances noise robustness and reduces computational complexity, outperforming baseline models under noisy conditions.
Contribution
It is the first DWT-based method compatible with video inputs and MAE pre-training, eliminating the need for inverse transforms and focusing on multi-scale noise-resilient features.
Findings
Up to 8% accuracy improvement on noisy ImageNet-C datasets.
Up to 2.7% accuracy gain on Imagenet-P benchmarks.
Up to 13% higher accuracy on UCF-101 under severe noise.
Abstract
While deep learning-based models like transformers, have revolutionized time-series and vision tasks, they remain highly susceptible to noise and often overfit on noisy patterns rather than robust features. This issue is exacerbated in vision transformers, which rely on pixel-level details that can easily be corrupt. To address this, we leverage the discrete wavelet transform (DWT) for its ability to decompose into multi-resolution layers, isolating noise primarily in the high frequency domain while preserving essential low-frequency information for resilient feature learning. Conventional DWT-based methods, however, struggle with computational inefficiencies due to the requirement for a subsequent inverse discrete wavelet transform (IDWT) step. In this work, we introduce RobustFormer, a novel framework that enables noise-robust masked autoencoder (MAE) pre-training for both images and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage and Signal Denoising Methods · Neural Networks and Applications · Model Reduction and Neural Networks
MethodsSoftmax · Attention Is All You Need
