FA-Seg: A Fast and Accurate Diffusion-Based Method for Open-Vocabulary Segmentation

Huy Che; Vinh-Tiep Nguyen

arXiv:2506.23323·cs.CV·April 30, 2026

FA-Seg: A Fast and Accurate Diffusion-Based Method for Open-Vocabulary Segmentation

Huy Che, Vinh-Tiep Nguyen

PDF

1 Repo

TL;DR

FA-Seg is a novel diffusion-based framework for open-vocabulary segmentation that achieves high accuracy and efficiency without training, leveraging a minimal (1+1)-step process and innovative attention refinement techniques.

Contribution

It introduces a training-free, diffusion model-based segmentation method with dual-prompt attention, hierarchical refinement, and test-time flipping for improved open-vocabulary segmentation.

Findings

01

Achieves 43.8% average mIoU on multiple benchmarks.

02

Operates with only a (1+1)-step process from a pretrained diffusion model.

03

Maintains high inference efficiency while surpassing state-of-the-art performance.

Abstract

Open-vocabulary semantic segmentation (OVSS) aims to segment objects from arbitrary text categories without requiring densely annotated datasets. Although contrastive learning based models enable zero-shot segmentation, they often lose fine spatial precision at pixel level, due to global representation bias. In contrast, diffusion-based models naturally encode fine-grained spatial features via attention mechanisms that capture both global context and local details. However, they often face challenges in balancing the computation costs and the quality of the segmentation mask. In this work, we present FA-Seg, a Fast and Accurate training-free framework for open-vocabulary segmentation based on diffusion models. FA-Seg performs segmentation using only a (1+1)-step from a pretrained diffusion model. Moreover, instead of running multiple times for different classes, FA-Seg performs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chequanghuy/FA-Seg
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.