3SGen: Unified Subject, Style, and Structure-Driven Image Generation with Adaptive Task-specific Memory

Xinyang Song; Libin Wang; Weining Wang; Zhiwei Li; Jianxin Sun; Dandan Zheng; Jingdong Chen; Qi Li; Zhenan Sun

arXiv:2512.19271·cs.CV·December 23, 2025

3SGen: Unified Subject, Style, and Structure-Driven Image Generation with Adaptive Task-specific Memory

Xinyang Song, Libin Wang, Weining Wang, Zhiwei Li, Jianxin Sun, Dandan Zheng, Jingdong Chen, Qi Li, Zhenan Sun

PDF

Open Access

TL;DR

3SGen introduces a unified, task-aware image generation framework that effectively combines subject, style, and structure conditioning within a single model, utilizing adaptive memory to enhance task transferability and detail preservation.

Contribution

The paper presents 3SGen, a novel unified model with adaptive memory for simultaneous subject, style, and structure conditioning, improving task disentanglement and scalability.

Findings

01

Outperforms existing methods on multiple benchmarks.

02

Effectively disentangles and combines different conditioning modes.

03

Scales well to complex, compositional inputs.

Abstract

Recent image generation approaches often address subject, style, and structure-driven conditioning in isolation, leading to feature entanglement and limited task transferability. In this paper, we introduce 3SGen, a task-aware unified framework that performs all three conditioning modes within a single model. 3SGen employs an MLLM equipped with learnable semantic queries to align text-image semantics, complemented by a VAE branch that preserves fine-grained visual details. At its core, an Adaptive Task-specific Memory (ATM) module dynamically disentangles, stores, and retrieves condition-specific priors, such as identity for subjects, textures for styles, and spatial layouts for structures, via a lightweight gating mechanism along with several scalable memory items. This design mitigates inter-task interference and naturally scales to compositional inputs. In addition, we propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Visual Attention and Saliency Detection · Aesthetic Perception and Analysis