Arbitrary-Scale Image Generation and Upsampling using Latent Diffusion   Model and Implicit Neural Decoder

Jinseok Kim; Tae-Kyun Kim

arXiv:2403.10255·cs.CV·March 18, 2024·1 cites

Arbitrary-Scale Image Generation and Upsampling using Latent Diffusion Model and Implicit Neural Decoder

Jinseok Kim, Tae-Kyun Kim

PDF

Open Access

TL;DR

This paper introduces a novel pipeline combining latent diffusion, auto-encoding, and implicit neural decoding to enable efficient, high-quality, and arbitrary-scale image super-resolution and generation, outperforming existing methods in quality, diversity, and speed.

Contribution

The paper presents a new approach that allows arbitrary-scale image super-resolution and generation using a latent diffusion model with an implicit neural decoder, improving efficiency and consistency.

Findings

01

Outperforms existing methods in image quality and diversity.

02

Achieves faster inference and lower memory usage.

03

Maintains scale consistency across different resolutions.

Abstract

Super-resolution (SR) and image generation are important tasks in computer vision and are widely adopted in real-world applications. Most existing methods, however, generate images only at fixed-scale magnification and suffer from over-smoothing and artifacts. Additionally, they do not offer enough diversity of output images nor image consistency at different scales. Most relevant work applied Implicit Neural Representation (INR) to the denoising diffusion model to obtain continuous-resolution yet diverse and high-quality SR results. Since this model operates in the image space, the larger the resolution of image is produced, the more memory and inference time is required, and it also does not maintain scale-specific consistency. We propose a novel pipeline that can super-resolve an input image or generate from a random noise a novel image at arbitrary scales. The method consists of a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing and 3D Reconstruction

MethodsDiffusion · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings