PromptMix: Text-to-image diffusion models enhance the performance of   lightweight networks

Arian Bakhtiarnia; Qi Zhang; and Alexandros Iosifidis

arXiv:2301.12914·cs.CV·February 1, 2023·1 cites

PromptMix: Text-to-image diffusion models enhance the performance of lightweight networks

Arian Bakhtiarnia, Qi Zhang, and Alexandros Iosifidis

PDF

Open Access

TL;DR

PromptMix leverages text-to-image diffusion models and synthetic data generation to enhance the performance of lightweight networks in dense regression tasks like crowd counting, achieving up to 26% improvement.

Contribution

This paper introduces PromptMix, a novel data augmentation method that uses text prompts and diffusion models to generate synthetic images for improving lightweight network performance.

Findings

01

Significant performance increase of up to 26% across five datasets.

02

Effective use of synthetic images generated from text prompts.

03

Enhanced training data improves lightweight network accuracy.

Abstract

Many deep learning tasks require annotations that are too time consuming for human operators, resulting in small dataset sizes. This is especially true for dense regression problems such as crowd counting which requires the location of every person in the image to be annotated. Techniques such as data augmentation and synthetic data generation based on simulations can help in such cases. In this paper, we introduce PromptMix, a method for artificially boosting the size of existing datasets, that can be used to improve the performance of lightweight networks. First, synthetic images are generated in an end-to-end data-driven manner, where text prompts are extracted from existing datasets via an image captioning deep network, and subsequently introduced to text-to-image diffusion models. The generated images are then annotated using one or more high-performing deep networks, and mixed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Music and Audio Processing · Multimodal Machine Learning Applications

MethodsDiffusion