Alfie: Democratising RGBA Image Generation With No $$$
Fabio Quattrini, Vittorio Pippi, Silvia Cascianelli, Rita Cucchiara

TL;DR
Alfie introduces a cost-effective method to generate high-quality RGBA images using a pre-trained Diffusion Transformer, enabling seamless integration into creative workflows without additional training or resources.
Contribution
This work presents a novel inference-time modification of a pre-trained Diffusion Transformer to generate RGBA images with irregular shapes, enhancing accessibility and usability in design and artistic applications.
Findings
Users prefer Alfie's generated images over traditional matting methods.
Generated illustrations integrate well into composite scene pipelines.
The approach requires no additional training or computational resources.
Abstract
Designs and artworks are ubiquitous across various creative fields, requiring graphic design skills and dedicated software to create compositions that include many graphical elements, such as logos, icons, symbols, and art scenes, which are integral to visual storytelling. Automating the generation of such visual elements improves graphic designers' productivity, democratizes and innovates the creative industry, and helps generate more realistic synthetic data for related tasks. These illustration elements are mostly RGBA images with irregular shapes and cutouts, facilitating blending and scene composition. However, most image generation models are incapable of generating such images and achieving this capability requires expensive computational resources, specific training recipes, or post-processing solutions. In this work, we propose a fully-automated approach for obtaining RGBA…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · 3D Shape Modeling and Analysis
MethodsAttention Is All You Need · Linear Layer · Adam · Layer Normalization · Position-Wise Feed-Forward Layer · Dense Connections · Residual Connection · Multi-Head Attention · Byte Pair Encoding · Absolute Position Encodings
