TL;DR
OmniAlpha introduces a unified reinforcement learning framework that models transparency-aware image generation by jointly optimizing RGB and alpha channels across multiple tasks, improving quality and coherence.
Contribution
It presents a novel multi-task RL approach combining an alpha-aware VAE and Diffusion Transformer for high-quality, unified RGBA generation and manipulation.
Findings
Achieves 9.07% reduction in RGB L1 on layer decomposition.
Outperforms specialized models with 74%/68% improvements on SAD/Grad for matting.
Consistently outperforms baseline models across five transparency-aware tasks.
Abstract
Transparency-aware generation requires modeling not only RGB appearance but also alpha-based opacity and cross-layer composition, which are essential for tasks such as image matting, object removal, layer decomposition, and multi-layer content creation. However, existing RGBA-related methods remain largely fragmented, with separate pipelines designed for individual tasks. While a unified model is desirable, supervised fine-tuning alone is insufficient, as localized regression objectives cannot directly optimize the compositional fidelity, alpha-boundary precision, and structural consistency required for high-quality RGBA generation. To address this, we propose OmniAlpha, a unified multi-task reinforcement learning framework for transparency-aware generation and manipulation. OmniAlpha combines an end-to-end alpha-aware VAE and a sequence-to-sequence Diffusion Transformer, with a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
