Towards Evaluating Robustness of Prompt Adherence in Text to Image Models
Sujith Vemishetty, Advitiya Arora, Anupama Sharma

TL;DR
This paper develops a framework to evaluate the robustness of Text-to-Image models in adhering to prompts, revealing their struggles with simple variations and distribution conformity, and introduces a novel dataset and evaluation pipeline.
Contribution
It introduces a comprehensive evaluation framework, a novel dataset, and a pipeline for assessing prompt adherence and robustness in Text-to-Image models.
Findings
Models struggle with simple binary images and factors of variation.
Pre-trained VAEs show failure to generate images matching dataset distribution.
Evaluation pipeline effectively measures prompt adherence and robustness.
Abstract
The advancements in the domain of LLMs in recent years have surprised many, showcasing their remarkable capabilities and diverse applications. Their potential applications in various real-world scenarios have led to significant research on their reliability and effectiveness. On the other hand, multimodal LLMs and Text-to-Image models have only recently gained prominence, especially when compared to text-only LLMs. Their reliability remains constrained due to insufficient research on assessing their performance and robustness. This paper aims to establish a comprehensive evaluation framework for Text-to-Image models, concentrating particularly on their adherence to prompts. We created a novel dataset that aimed to assess the robustness of these models in generating images that conform to the specified factors of variation in the input text prompts. Our evaluation studies present…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
