SwinIA: Self-Supervised Blind-Spot Image Denoising without Convolutions
Mikhail Papkov, Pavel Chizhov, Leopold Parts

TL;DR
SwinIA introduces a fully-transformer architecture for self-supervised blind-spot image denoising, achieving state-of-the-art results without masking or prior noise knowledge, simplifying the denoising process.
Contribution
It is the first to use a transformer-based autoencoder for self-supervised denoising, removing the need for masking or noise model information.
Findings
Achieves state-of-the-art performance on benchmark datasets.
Operates without masking or prior noise knowledge.
Simplifies the training process with end-to-end learning.
Abstract
Self-supervised image denoising implies restoring the signal from a noisy image without access to the ground truth. State-of-the-art solutions for this task rely on predicting masked pixels with a fully-convolutional neural network. This most often requires multiple forward passes, information about the noise model, or intricate regularization functions. In this paper, we propose a Swin Transformer-based Image Autoencoder (SwinIA), the first fully-transformer architecture for self-supervised denoising. The flexibility of the attention mechanism helps to fulfill the blind-spot property that convolutional counterparts normally approximate. SwinIA can be trained end-to-end with a simple mean squared error loss without masking and does not require any prior knowledge about clean data or noise distribution. Simple to use, SwinIA establishes the state of the art on several common benchmarks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage and Signal Denoising Methods · Photoacoustic and Ultrasonic Imaging · Image Processing Techniques and Applications
MethodsSoftmax · Linear Layer · Relative Position Encodings · Layer Normalization · Residual Connection · Dense Connections · Multi-Head Attention · Attention Is All You Need · Swin Transformer
