DILLEMA: Diffusion and Large Language Models for Multi-Modal Augmentation
Luciano Baresi, Davide Yi Xian Hu, Muhammad Irfan Mas'udi, Giovanni, Quattrocchi

TL;DR
This paper introduces DILLEMA, a novel framework that combines Large Language Models and Diffusion Models to generate realistic, diverse test cases for evaluating and improving the robustness of vision neural networks.
Contribution
It presents a new method for creating high-fidelity, counterfactual test images from textual descriptions, enhancing robustness testing beyond existing augmentation techniques.
Findings
Generated test cases reveal model weaknesses
Improved model robustness through targeted retraining
High human agreement on image realism
Abstract
Ensuring the robustness of deep learning models requires comprehensive and diverse testing. Existing approaches, often based on simple data augmentation techniques or generative adversarial networks, are limited in producing realistic and varied test cases. To address these limitations, we present a novel framework for testing vision neural networks that leverages Large Language Models and control-conditioned Diffusion Models to generate synthetic, high-fidelity test cases. Our approach begins by translating images into detailed textual descriptions using a captioning model, allowing the language model to identify modifiable aspects of the image and generate counterfactual descriptions. These descriptions are then used to produce new test images through a text-to-image diffusion process that preserves spatial consistency and maintains the critical elements of the scene. We demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling
MethodsDiffusion
