SubZero: Composing Subject, Style, and Action via Zero-Shot   Personalization

Shubhankar Borse; Kartikeya Bhardwaj; Mohammad Reza Karimi Dastjerdi,; Hyojin Park; Shreya Kadambi; Shobitha Shivakumar; Prathamesh Mandke; Ankita; Nayak; Harris Teague; Munawar Hayat; Fatih Porikli

arXiv:2502.19673·cs.CV·February 28, 2025

SubZero: Composing Subject, Style, and Action via Zero-Shot Personalization

Shubhankar Borse, Kartikeya Bhardwaj, Mohammad Reza Karimi Dastjerdi,, Hyojin Park, Shreya Kadambi, Shobitha Shivakumar, Prathamesh Mandke, Ankita, Nayak, Harris Teague, Munawar Hayat, Fatih Porikli

PDF

Open Access

TL;DR

SubZero introduces a tuning-free framework for personalized subject, style, and action generation in diffusion models, achieving high flexibility and quality without fine-tuning, suitable for mobile devices.

Contribution

The paper presents a novel, tuning-free method with constraints and orthogonalized temporal aggregation for improved personalized composition in diffusion models.

Findings

01

Significant improvement over state-of-the-art methods in subject, style, and action composition.

02

Effective reduction of content and style leakage artifacts.

03

Compatible with edge devices for real-time personalized generation.

Abstract

Diffusion models are increasingly popular for generative tasks, including personalized composition of subjects and styles. While diffusion models can generate user-specified subjects performing text-guided actions in custom styles, they require fine-tuning and are not feasible for personalization on mobile devices. Hence, tuning-free personalization methods such as IP-Adapters have progressively gained traction. However, for the composition of subjects and styles, these works are less flexible due to their reliance on ControlNet, or show content and style leakage artifacts. To tackle these, we present SubZero, a novel framework to generate any subject in any style, performing any action without the need for fine-tuning. We propose a novel set of constraints to enhance subject and style similarity, while reducing leakage. Additionally, we propose an orthogonalized temporal aggregation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Artificial Intelligence in Games · Human Motion and Animation

MethodsDiffusion · Sparse Evolutionary Training