aMUSEd: An Open MUSE Reproduction

Suraj Patil; William Berman; Robin Rombach; Patrick von Platen

arXiv:2401.01808·cs.CV·January 4, 2024·1 cites

aMUSEd: An Open MUSE Reproduction

Suraj Patil, William Berman, Robin Rombach, Patrick von Platen

PDF

Open Access 1 Repo 2 Models

TL;DR

aMUSEd introduces a lightweight, open-source masked image model for fast, interpretable text-to-image generation, requiring fewer parameters and inference steps than latent diffusion, with the ability to learn new styles from minimal data.

Contribution

It presents aMUSEd, a compact MIM model that enhances text-to-image generation speed and interpretability, and provides reproducible code and checkpoints for large-scale use.

Findings

01

aMUSEd achieves fast image generation with 10% of MUSE's parameters.

02

It requires fewer inference steps than latent diffusion models.

03

The model can learn new styles from a single image.

Abstract

We present aMUSEd, an open-source, lightweight masked image model (MIM) for text-to-image generation based on MUSE. With 10 percent of MUSE's parameters, aMUSEd is focused on fast image generation. We believe MIM is under-explored compared to latent diffusion, the prevailing approach for text-to-image generation. Compared to latent diffusion, MIM requires fewer inference steps and is more interpretable. Additionally, MIM can be fine-tuned to learn additional styles with only a single image. We hope to encourage further exploration of MIM by demonstrating its effectiveness on large-scale text-to-image generation and releasing reproducible training code. We also release checkpoints for two models which directly produce images at 256x256 and 512x512 resolutions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

huggingface/amused
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization

MethodsMutual Information Machine/Mask Image Modeling