MAST: Mask-Guided Attention Mass Allocation for Training-Free Multi-Style Transfer
Dongkyung Kang, Jaeyeon Hwang, Junseo Park, Minji Kang, Yeryeong Lee, Beomseok Ko, Hanyoung Roh, Jeongmin Shin, Hyeryung Jang

TL;DR
MAST is a training-free multi-style transfer framework that uses mask-guided attention to control style interactions, ensuring artifact-free, structure-preserving stylization across multiple styles.
Contribution
It introduces a novel attention mechanism with four modules to improve multi-style diffusion-based style transfer without training.
Findings
Effectively mitigates boundary artifacts in multi-style transfer.
Maintains structural consistency and texture fidelity.
Performs well even with multiple styles applied.
Abstract
Style transfer aims to render a content image with the visual characteristics of a reference style while preserving its underlying semantic layout and structural geometry. While recent diffusion-based models demonstrate strong stylization capabilities by leveraging powerful generative priors and controllable internal representations, they typically assume a single global style. Extending them to multi-style scenarios often leads to boundary artifacts, unstable stylization, and structural inconsistency due to interference between multiple style representations. To overcome these limitations, we propose MAST (Mask-Guided Attention Mass Allocation for Training-Free Multi-Style Transfer), a novel training-free framework that explicitly controls content-style interactions within the diffusion attention mechanism. To achieve artifact-free and structure-preserving stylization, MAST integrates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
