Unsupervised Structural Scene Decomposition via Foreground-Aware Slot Attention with Pseudo-Mask Guidance

Huankun Sheng; Ming Li; Yixiang Wei; Yeying Fan; Yu-Hui Wen; Tieliang Gong; Yong-Jin Liu

arXiv:2512.02685·cs.CV·December 11, 2025

Unsupervised Structural Scene Decomposition via Foreground-Aware Slot Attention with Pseudo-Mask Guidance

Huankun Sheng, Ming Li, Yixiang Wei, Yeying Fan, Yu-Hui Wen, Tieliang Gong, Yong-Jin Liu

PDF

Open Access

TL;DR

This paper introduces FASA, a novel unsupervised scene decomposition method that explicitly separates foreground and background using a two-stage process with pseudo-mask guidance, improving object discovery in complex scenes.

Contribution

FASA is the first to explicitly model foreground-background separation in unsupervised scene decomposition with a dual-stage approach and pseudo-mask guidance, enhancing object representation accuracy.

Findings

01

FASA outperforms existing methods on synthetic and real datasets.

02

Explicit foreground modeling improves scene decomposition quality.

03

Pseudo-mask guidance reduces over-segmentation of objects.

Abstract

Recent advances in object-centric representation learning have shown that slot attention-based methods can effectively decompose visual scenes into object slot representations without supervision. However, existing approaches typically process foreground and background regions indiscriminately, often resulting in background interference and suboptimal instance discovery performance on real-world data. To address this limitation, we propose Foreground-Aware Slot Attention (FASA), a two-stage framework that explicitly separates foreground from background to enable precise object discovery. In the first stage, FASA performs a coarse scene decomposition to distinguish foreground from background regions through a dual-slot competition mechanism. These slots are initialized via a clustering-based strategy, yielding well-structured representations of salient regions. In the second stage, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning