Alfie: Democratising RGBA Image Generation With No $$$

Fabio Quattrini; Vittorio Pippi; Silvia Cascianelli; Rita Cucchiara

arXiv:2408.14826·cs.CV·August 28, 2024

Alfie: Democratising RGBA Image Generation With No $$$

Fabio Quattrini, Vittorio Pippi, Silvia Cascianelli, Rita Cucchiara

PDF

Open Access 2 Repos

TL;DR

Alfie introduces a cost-effective method to generate high-quality RGBA images using a pre-trained Diffusion Transformer, enabling seamless integration into creative workflows without additional training or resources.

Contribution

This work presents a novel inference-time modification of a pre-trained Diffusion Transformer to generate RGBA images with irregular shapes, enhancing accessibility and usability in design and artistic applications.

Findings

01

Users prefer Alfie's generated images over traditional matting methods.

02

Generated illustrations integrate well into composite scene pipelines.

03

The approach requires no additional training or computational resources.

Abstract

Designs and artworks are ubiquitous across various creative fields, requiring graphic design skills and dedicated software to create compositions that include many graphical elements, such as logos, icons, symbols, and art scenes, which are integral to visual storytelling. Automating the generation of such visual elements improves graphic designers' productivity, democratizes and innovates the creative industry, and helps generate more realistic synthetic data for related tasks. These illustration elements are mostly RGBA images with irregular shapes and cutouts, facilitating blending and scene composition. However, most image generation models are incapable of generating such images and achieving this capability requires expensive computational resources, specific training recipes, or post-processing solutions. In this work, we propose a fully-automated approach for obtaining RGBA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · 3D Shape Modeling and Analysis

MethodsAttention Is All You Need · Linear Layer · Adam · Layer Normalization · Position-Wise Feed-Forward Layer · Dense Connections · Residual Connection · Multi-Head Attention · Byte Pair Encoding · Absolute Position Encodings