Masked Diffusion as Self-supervised Representation Learner

Zixuan Pan; Jianxu Chen; Yiyu Shi

arXiv:2308.05695·cs.CV·April 16, 2024·6 cites

Masked Diffusion as Self-supervised Representation Learner

Zixuan Pan, Jianxu Chen, Yiyu Shi

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces Masked Diffusion Model (MDM), a self-supervised learning method that replaces Gaussian noise with masking in diffusion models, significantly improving semantic segmentation performance in various domains.

Contribution

The paper proposes MDM, a novel diffusion-based self-supervised learning approach using masking instead of noise, enhancing segmentation tasks especially in few-shot settings.

Findings

01

Outperforms prior benchmarks in medical and natural image segmentation

02

Achieves significant improvements in few-shot segmentation scenarios

03

Demonstrates the effectiveness of masking in diffusion models

Abstract

Denoising diffusion probabilistic models have recently demonstrated state-of-the-art generative performance and have been used as strong pixel-level representation learners. This paper decomposes the interrelation between the generative capability and representation learning ability inherent in diffusion models. We present the masked diffusion model (MDM), a scalable self-supervised representation learner for semantic segmentation, substituting the conventional additive Gaussian noise of traditional diffusion with a masking mechanism. Our proposed approach convincingly surpasses prior benchmarks, demonstrating remarkable advancements in both medical and natural image semantic segmentation tasks, particularly in few-shot scenarios.

Peer Reviews

Decision·Submitted to ICLR 2024

Reviewer 01Rating 3· reject, not good enoughConfidence 4

Strengths

1. The statistical results shown in Table 1 look promising if all the methods are compared fairly.

Weaknesses

1. The writing of this paper needs to be improved. Many claims in the introduction section are not very well-supported (e.g. "such efforts risk deviating from the theoretical underpinnings of diffusions") and are not very well organized. 2. It is not convincing enough to conclude that the representation learned is better while only tested on segmentation downstream tasks. 3. The choice of SSIM over MSE is rather empirical and not well justified.

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

- The paper present an extensive experimental evaluations on 2 natural and 2 medical image data sets with ablation studies.

Weaknesses

- My main concern is the novelty of the method. The paper mentions that with the fixed t, the method degrades to a vanilla masked autoencoder with SSIM loss. This basically means that the only contribution of the paper is masking the image with a dynamic masking ratio during training, which concerns me regarding the contribution of the paper. - Although the improvement achieved by this small change is interesting on Glas 10% case (IOU is 76.19 for MAE and 82.70 MDM with MSE; which is quite a si

Reviewer 03Rating 5· marginally below the acceptance thresholdConfidence 3

Strengths

Strengths: 1. **Novel Concept**: The paper presents a new approach in self-supervised learning using diffusion models. The probabilistic mask for data occlusion offers an alternative to traditional static methods, suggesting a different way to approach representation learning. 2. **Empirical Evidence**: The results and ablation studies provide evidence of the method's performance. The proposed technique shows improvements over vanilla MAE, DDPM, and certain traditional models on segmentation d

Weaknesses

Areas of Improvement for the Paper: 1. **Benchmarking for Segmentation Tasks**: The primary focus on segmentation necessitates benchmarking against specialized self-supervised learning (SSL) methods designed for this task, both at the instance-level and pixel/patch-level. A direct comparison with methods such as Leopart, IIC, MaskContrast, DenseCL, MoCoV2, and DINO on standard datasets like COCO and PVOC would provide a holistic evaluation. Refer to paper: Self-Supervised Learning of Object Par

Code & Models

Repositories

zx-pan/mdm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Model Reduction and Neural Networks · Computational Physics and Python Applications

MethodsDiffusion