TL;DR
This paper introduces DiffDIS, a diffusion-based high-resolution image segmentation model that leverages pre-trained diffusion models and auxiliary edge tasks to achieve fast, accurate, and detailed object segmentation.
Contribution
It proposes a novel diffusion-driven segmentation approach that reduces inference time and enhances boundary detail preservation using a task-specific denoising strategy and auxiliary edge generation.
Findings
Achieves state-of-the-art results on DIS5K dataset.
Demonstrates rapid inference while maintaining high segmentation accuracy.
Effectively preserves fine object boundaries in high-resolution images.
Abstract
In the realm of high-resolution (HR), fine-grained image segmentation, the primary challenge is balancing broad contextual awareness with the precision required for detailed object delineation, capturing intricate details and the finest edges of objects. Diffusion models, trained on vast datasets comprising billions of image-text pairs, such as SD V2.1, have revolutionized text-to-image synthesis by delivering exceptional quality, fine detail resolution, and strong contextual awareness, making them an attractive solution for high-resolution image segmentation. To this end, we propose DiffDIS, a diffusion-driven segmentation model that taps into the potential of the pre-trained U-Net within diffusion models, specifically designed for high-resolution, fine-grained object segmentation. By leveraging the robust generalization capabilities and rich, versatile image representation prior of…
Peer Reviews
Decision·ICLR 2025 Poster
1. Multi-level Design Innovations: The paper combines single-step denoising, edge-assisted generation, and multi-scale conditional injection to address the challenges of high-resolution segmentation, balancing speed and detail retention effectively. 2. Comprehensive Experiments: The experimental setup on the DIS5K dataset is thorough, with comparisons to multiple specialized and general segmentation models. Ablation studies illustrate each component’s contribution, supporting the rationale behin
1. Clarify Novelty of the Single-Step Denoising: While the single-step denoising strategy indeed boosts inference efficiency, a similar concept has been explored in models like GenPercept. I suggest that the authors clarify if DiffDIS’s single-step denoising incorporates task-specific optimizations for DIS tasks, to better highlight its originality. 2. Elaborate on the Edge-Assisted Generation’s Distinctiveness and Adaptation for High-Resolution Segmentation: The edge-assisted generation approa
1. The author discovered that the introduction of edges can enhance the detail and performance of segmentation. They used batch discriminative embedding to distinguish between edges and segmentation. This is a novel method. 2. The author provided detailed experiments that demonstrate the method's strong performance across multiple aspects, and also included an ablation study to prove the effectiveness of each module.
1. The description of the one step inference is not comprehensive enough, please see Q2 2. For dichotomous segmentation, using an RGB 3-channel VAE to encode a single-channel segmentation mask might be a bit overkill. As an advancement in dichotomous segmentation, some earlier works have used diffusion models for matting, which also achieved very good results. However, considering that it can produce decent results in just one step, it is acceptable.
1. The proposed method reaches SOTA performance and beat other concurrent diffusion-based approach on DIS datasets. 2. This is an early attempt that uses a pre-trained generative model for challenging DIS task. 3. The method is efficient in comparison to the line of work that follows SegDiff, which runs diffusion process for more time-steps. 4. The ablation studies include both quantitative numbers and qualitative visualizations, which are helpful for understanding how the whole framework is des
1. The application scenario seems limited. The task setting is only limited to Dichotomous Image Segmentation. It could be more convincing if the authors can also address the applicability of this approach in more settings, e.g. image matting, foreground object segmentation, edge detection. 2. Running diffusion for one-step for segmentation is not a great contribution. This paper might miss some related work that is in the line of DDPM-Seg [1]. A lot of recent work that uses StableDiffusion for
Code & Models
Videos
Taxonomy
TopicsMedical Image Segmentation Techniques
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Concatenated Skip Connection · Convolution · Diffusion · Max Pooling · U-Net
