From Diffusion to Rectified Flow: Rethinking Text-Based Segmentation
Zishen Qu, Xuesong Li, Haijian Gu, Hongwei Kang, Quan Meng, Tianrui Niu, Xin Yang, Ruidong Pan

TL;DR
This paper introduces RLFSeg, a new framework using Rectified Flow for direct image-to-segmentation mapping, outperforming diffusion-based methods especially in zero-shot scenarios.
Contribution
It proposes a novel Rectified Flow-based approach that improves segmentation accuracy without modifying the pretrained generative model.
Findings
RLFSeg outperforms previous diffusion-based segmentation methods.
The model achieves high accuracy with only a single inference step.
Label refinement and adaptive sampling enhance segmentation precision.
Abstract
Text-based image segmentation aims to delineate object boundaries within an image from text prompts, offering higher flexibility and broader application scope compared to traditional fixed-category segmentation tasks. Recent studies have shown that diffusion models (e.g., Stable Diffusion) can provide rich multimodal semantic features, leading to studies of using diffusion models as feature extractors for segmentation tasks. Such methods, however, inherit the generative natures of diffusion models that are harmful to discriminative segmentation tasks. In response, we propose RLFSeg, a novel framework that leverages Rectified Flow to learn direct mapping from the image to the segmentation mask within the latent space. The model is thus freed from the noise-denoise process and the need to optimize the time step of diffusion models, resulting in substantially better performance than…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
