SPI-GAN: Denoising Diffusion GANs with Straight-Path Interpolations

Jinsung Jeon; Noseong Park

arXiv:2206.14464·cs.LG·March 15, 2024·1 cites

SPI-GAN: Denoising Diffusion GANs with Straight-Path Interpolations

Jinsung Jeon, Noseong Park

PDF

Open Access 3 Reviews

TL;DR

SPI-GAN introduces a novel GAN-based denoising approach with straight-path interpolation, significantly reducing sampling time while maintaining high quality and diversity comparable to score-based models.

Contribution

The paper proposes a new GAN architecture utilizing straight-path interpolation and continuous mapping to simplify denoising, improving efficiency without sacrificing quality.

Findings

01

Achieves high sampling quality and diversity on CIFAR-10 and CelebA-HQ-256.

02

Reduces sampling time compared to score-based generative models.

03

Balances quality, diversity, and efficiency effectively.

Abstract

Score-based generative models (SGMs) show the state-of-the-art sampling quality and diversity. However, their training/sampling complexity is notoriously high due to the highly complicated forward/reverse processes, so they are not suitable for resource-limited settings. To solving this problem, learning a simpler process is gathering much attention currently. We present an enhanced GAN-based denoising method, called SPI-GAN, using our proposed straight-path interpolation definition. To this end, we propose a GAN architecture i) denoising through the straight-path and ii) characterized by a continuous mapping neural network for imitating the denoising path. This approach drastically reduces the sampling time while achieving as high sampling quality and diversity as SGMs. As a result, SPI-GAN is one of the best-balanced models among the sampling quality, diversity, and time for CIFAR-10,…

Peer Reviews

Decision·Submitted to ICLR 2024

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 5

Strengths

1. Combining diffusion models with GANs for enhanced diversity and a fast generation process presents an intriguing research topic. 2. Employing Neural Ordinary Differential Equations (NODEs) to map embeddings, while conditioning the latent code input to the generator on time, is an interesting approach. 3. The experimental section demonstrates performance improvements over the baselines. 4. The presentation is clear and easy to follow.

Weaknesses

1. The rationale behind integrating the diffusion process into the generator, particularly in terms of applying it to the latent code input, remains unclear. This might be attributable to the discriminator being exposed to a variety of augmented images, potentially helping to avert overfitting. Meanwhile, the results in Table 3 depict a noticeably inferior performance of SPI-GAN in comparison to both Diffusion-GAN and StyleGANs, raising questions about the soundness of the approach of employing

Reviewer 02Rating 8· accept, good paperConfidence 3

Strengths

The manuscript is well-written and easy to follow. It explains the gap (trilemma) and addresses the gap with a novel solution. The explanation is supported by formulation and figures. The evaluations are well-performed and convincing. They share their code and the trained networks for reproducibility.

Weaknesses

The discussion of the limitations is short, so it can be elaborated. Although SPI-GAN remodels the task in a simpler way and is easier to learn, its training time is longer.

Reviewer 03Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

- The authors have succeeded in presenting their research in a clear way, and the background work is relevant. - The integration of optimal transport ideas into the context of diffusion is a novel and intriguing concept. This innovative approach not only contributes to the paper's originality but also has the potential to inspire further research in this promising direction. - The paper's method seems to achieve a shorter interpolation path between images and noise. This outcome opens up new opp

Weaknesses

- The NODE map, as evidenced in Figure 6, appears to offer marginal improvements over the vanilla mapping network from StyleGAN2. Consequently, the novelty of the method diminishes, as it seems to reduce to an image and time-conditioned StyleGAN. The authors should address how the method distinguishes itself more significantly from existing approaches. - In a quantitative image quality comparison, the proposed method does not appear to clearly outperform vanilla StyleGAN2 in Tables 1-3. For inst

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Generative Adversarial Networks and Image Synthesis · Music Technology and Sound Studies

MethodsDiffusion