Training Free Stylized Abstraction
Aimon Rahman, Kartik Narayan, Vishal M. Patel

TL;DR
This paper introduces a training-free method for stylized abstraction that generates visually exaggerated yet semantically faithful images from a single input, leveraging vision-language models and a novel flow inversion strategy.
Contribution
It presents a novel inference-time framework for stylized abstraction that does not require training, using style-aware features and a new rectified flow inversion for structure reconstruction.
Findings
Effective across diverse styles like LEGO and South Park
Generalizes well to unseen identities and styles
Supports multi-round abstraction without fine-tuning
Abstract
Stylized abstraction synthesizes visually exaggerated yet semantically faithful representations of subjects, balancing recognizability with perceptual distortion. Unlike image-to-image translation, which prioritizes structural fidelity, stylized abstraction demands selective retention of identity cues while embracing stylistic divergence, especially challenging for out-of-distribution individuals. We propose a training-free framework that generates stylized abstractions from a single image using inference-time scaling in vision-language models (VLLMs) to extract identity-relevant features, and a novel cross-domain rectified flow inversion strategy that reconstructs structure based on style-dependent priors. Our method adapts structural restoration dynamically through style-aware temporal scheduling, enabling high-fidelity reconstructions that honor both subject and style. It supports…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Face recognition and analysis
