TL;DR
Shape2Animal is a framework that creatively transforms natural silhouettes into plausible animal images using vision-language models and diffusion techniques, enabling novel visual storytelling and art applications.
Contribution
It introduces an automated method combining open-vocabulary segmentation and diffusion models to reinterpret silhouettes as animals, enhancing creative visual content generation.
Findings
Demonstrates robustness across diverse real-world silhouettes.
Produces visually coherent and spatially consistent animal images.
Offers new opportunities for digital art and educational content.
Abstract
Humans possess a unique ability to perceive meaningful patterns in ambiguous stimuli, a cognitive phenomenon known as pareidolia. This paper introduces Shape2Animal framework to mimics this imaginative capacity by reinterpreting natural object silhouettes, such as clouds, stones, or flames, as plausible animal forms. Our automated framework first performs open-vocabulary segmentation to extract object silhouette and interprets semantically appropriate animal concepts using vision-language models. It then synthesizes an animal image that conforms to the input shape, leveraging text-to-image diffusion model and seamlessly blends it into the original scene to generate visually coherent and spatially consistent compositions. We evaluated Shape2Animal on a diverse set of real-world inputs, demonstrating its robustness and creative potential. Our Shape2Animal can offer new opportunities for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
