FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion   Models

Tong Wu; Yinghao Xu; Ryan Po; Mengchen Zhang; Guandao Yang; Jiaqi; Wang; Ziwei Liu; Dahua Lin; Gordon Wetzstein

arXiv:2412.07674·cs.CV·December 11, 2024

FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models

Tong Wu, Yinghao Xu, Ryan Po, Mengchen Zhang, Guandao Yang, Jiaqi, Wang, Ziwei Liu, Dahua Lin, Gordon Wetzstein

PDF

Open Access 1 Video

TL;DR

This paper introduces FiVA, a comprehensive dataset and framework for decomposing and transferring specific visual attributes like lighting and texture in text-to-image models, enabling more precise and customizable image generation.

Contribution

The work presents the first fine-grained visual attribute dataset (FiVA) and a novel adaptation framework (FiVA-Adapter) for improved attribute manipulation in image synthesis.

Findings

01

FiVA dataset contains around 1 million images with detailed attribute annotations.

02

FiVA-Adapter enables selective transfer of visual attributes from multiple sources.

03

Enhanced customization in image generation with better attribute control.

Abstract

Recent advances in text-to-image generation have enabled the creation of high-quality images with diverse applications. However, accurately describing desired visual attributes can be challenging, especially for non-experts in art and photography. An intuitive solution involves adopting favorable attributes from the source images. Current methods attempt to distill identity and style from source images. However, "style" is a broad concept that includes texture, color, and artistic elements, but does not cover other important attributes such as lighting and dynamics. Additionally, a simplified "style" adaptation prevents combining multiple attributes from different sources into one generated image. In this work, we formulate a more effective approach to decompose the aesthetics of a picture into specific visual attributes, allowing users to apply characteristics such as lighting,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models· slideslive

Taxonomy

TopicsImage Retrieval and Classification Techniques