AnimeAdapter: Fine-grained and Consistent Zero-shot Anime Character Generation
Yixuan Han

TL;DR
AnimeAdapter is a lightweight, controllable, and consistent zero-shot anime character generator that injects visual features into Stable Diffusion without fine-tuning, enabling diverse editing scenarios.
Contribution
It introduces a novel appearance adapter leveraging CLIP-based local spatialization and pose-aware conditioning, with a new anime character dataset, all compatible with existing workflows.
Findings
Enables controllable anime character generation without fine-tuning.
Maintains consistency across diverse editing conditions.
Achieves high-quality results in practical editing scenarios.
Abstract
We present a lightweight appearance adapter for Stable Diffusion that enables controllable and consistent anime character generation under diverse editing conditions. Instead of relying on large-scale vision-language models or per-subject fine-tuning, our method injects fine-grained visual features from a single reference image into the diffusion process. Based on CLIP emergent local spatialization, we develop semantic-selective local attention. To further disentangle character appearance from spatial layout, we incorporate pose-aware conditioning during adapter training. The resulting pretrained adapter remains compact, modular, and fully compatible with Stable Diffusion community workflows, while requiring no additional fine-tuning at deployment time. Furthermore, we present a high-quality anime character dataset based on curated and restructured Danbooru prompts, and evaluate our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
