PokeFusion Attention: A Lightweight Cross-Attention Mechanism for Style-Conditioned Image Generation

Jingbang Tang

arXiv:2602.03220·cs.CV·March 30, 2026

PokeFusion Attention: A Lightweight Cross-Attention Mechanism for Style-Conditioned Image Generation

Jingbang Tang

PDF

TL;DR

PokeFusion Attention introduces a lightweight, decoder-level cross-attention mechanism that models style as a learned prior, enabling efficient style-conditioned image generation without external references.

Contribution

It presents a parameter-efficient, plug-and-play style conditioning method that improves style fidelity and structural consistency in diffusion-based image generation.

Findings

01

Enhances style fidelity and semantic alignment in stylized character generation.

02

Maintains low parameter overhead and simple inference.

03

Outperforms adapter-based baselines in style-conditioned generation.

Abstract

Style-conditioned text-to-image (T2I) generation with diffusion models requires both stable character structure and consistent, fine-grained style expression across diverse prompts. Existing approaches either rely on text-only prompting, which is often insufficient to specify visual style, or introduce reference-based adapters that depend on external images at inference time, increasing system complexity and limiting deployment flexibility. We propose PokeFusion Attention, a lightweight decoder-level cross-attention mechanism that models style as a learned distributional prior rather than instance-level conditioning. The method integrates textual semantics with learned style embeddings directly within the diffusion decoder, enabling effective stylized generation without requiring reference images at inference time. Only the cross-attention layers and a compact style projection module…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.