SDMatte: Grafting Diffusion Models for Interactive Matting

Longfei Huang; Yu Liang; Hao Zhang; Jinwei Chen; Wei Dong; Lunde Chen; Wanyu Liu; Bo Li; Peng-Tao Jiang

arXiv:2508.00443·cs.CV·August 5, 2025

SDMatte: Grafting Diffusion Models for Interactive Matting

Longfei Huang, Yu Liang, Hao Zhang, Jinwei Chen, Wei Dong, Lunde Chen, Wanyu Liu, Bo Li, Peng-Tao Jiang

PDF

Open Access 2 Models

TL;DR

SDMatte introduces a diffusion-based interactive matting approach that leverages text-driven and visual prompts, integrating spatial and opacity information with a masked self-attention mechanism for superior fine-detail extraction.

Contribution

The paper presents a novel diffusion-driven interactive matting model that transforms text prompts into visual prompts, integrating spatial and opacity cues with a masked self-attention mechanism.

Findings

01

Outperforms existing methods on multiple datasets

02

Effectively captures fine-grained edge details

03

Demonstrates robustness in interactive matting tasks

Abstract

Recent interactive matting methods have shown satisfactory performance in capturing the primary regions of objects, but they fall short in extracting fine-grained details in edge regions. Diffusion models trained on billions of image-text pairs, demonstrate exceptional capability in modeling highly complex data distributions and synthesizing realistic texture details, while exhibiting robust text-driven interaction capabilities, making them an attractive solution for interactive matting. To this end, we propose SDMatte, a diffusion-driven interactive matting model, with three key contributions. First, we exploit the powerful priors of diffusion models and transform the text-driven interaction capability into visual prompt-driven interaction capability to enable interactive matting. Second, we integrate coordinate embeddings of visual prompts and opacity embeddings of target objects into…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputer Graphics and Visualization Techniques · Generative Adversarial Networks and Image Synthesis · Image Enhancement Techniques