TL;DR
This paper introduces a zero-shot image-to-image translation method using text-guided latent diffusion models to translate between skulls and animals, enabling applications in paleontology and other fields.
Contribution
The paper proposes Revive-2I, a novel benchmark model for large domain gap translation using guided diffusion and text prompts, outperforming traditional GANs.
Findings
Guided diffusion models are effective for large domain gap translation.
Prompting with text provides scalable and flexible target domain guidance.
Traditional GANs fail at large domain gap translation.
Abstract
With a strong understanding of the target domain from natural language, we produce promising results in translating across large domain gaps and bringing skeletons back to life. In this work, we use text-guided latent diffusion models for zero-shot image-to-image translation (I2I) across large domain gaps (longI2I), where large amounts of new visual features and new geometry need to be generated to enter the target domain. Being able to perform translations across large domain gaps has a wide variety of real-world applications in criminology, astrology, environmental conservation, and paleontology. In this work, we introduce a new task Skull2Animal for translating between skulls and living animals. On this task, we find that unguided Generative Adversarial Networks (GANs) are not capable of translating across large domain gaps. Instead of these traditional I2I methods, we explore the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDiffusion
