IntentTuner: An Interactive Framework for Integrating Human Intents in   Fine-tuning Text-to-Image Generative Models

Xingchen Zeng; Ziyao Gao; Yilin Ye; and Wei Zeng

arXiv:2401.15559·cs.HC·January 30, 2024·2 cites

IntentTuner: An Interactive Framework for Integrating Human Intents in Fine-tuning Text-to-Image Generative Models

Xingchen Zeng, Ziyao Gao, Yilin Ye, and Wei Zeng

PDF

Open Access

TL;DR

IntentTuner is an interactive framework that enhances fine-tuning of text-to-image models by integrating human intentions through user-friendly tools and new metrics, improving model alignment and reducing effort.

Contribution

It introduces an interactive system for incorporating human intentions into fine-tuning, with novel metrics for measuring intent alignment and improved user experience.

Findings

01

Reduces cognitive effort in fine-tuning process

02

Produces models with better alignment to user intentions

03

Streamlines the fine-tuning workflow

Abstract

Fine-tuning facilitates the adaptation of text-to-image generative models to novel concepts (e.g., styles and portraits), empowering users to forge creatively customized content. Recent efforts on fine-tuning focus on reducing training data and lightening computation overload but neglect alignment with user intentions, particularly in manual curation of multi-modal training data and intent-oriented evaluation. Informed by a formative study with fine-tuning practitioners for comprehending user intentions, we propose IntentTuner, an interactive framework that intelligently incorporates human intentions throughout each phase of the fine-tuning workflow. IntentTuner enables users to articulate training intentions with imagery exemplars and textual descriptions, automatically converting them into effective data augmentation strategies. Furthermore, IntentTuner introduces novel metrics to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · 3D Modeling in Geospatial Applications · Image Processing and 3D Reconstruction