Vision-Language Model Dialog Games for Self-Improvement
Ksenia Konyushkova, Christos Kaplanis, Serkan Cabi, Misha Denil

TL;DR
This paper introduces VLM Dialog Games, a self-play framework for vision-language models that automatically generates high-quality training data, enabling models to improve iteratively without relying on large external datasets.
Contribution
The paper proposes a scalable self-improvement method for VLMs using goal-oriented self-play and synthetic data generation, advancing beyond traditional data collection approaches.
Findings
Fine-tuning on synthetic data improves downstream task performance.
The approach generalizes across different datasets.
Iterative self-play enhances model capabilities over time.
Abstract
The increasing demand for high-quality, diverse training data poses a significant bottleneck in advancing vision-language models (VLMs). This paper presents VLM Dialog Games, a novel and scalable self-improvement framework for VLMs. Our approach leverages self-play between two agents engaged in a goal-oriented play centered around image identification. By filtering for successful game interactions, we automatically curate a high-quality dataset of interleaved images and text. We demonstrate that fine-tuning on this synthetic data leads to performance gains on downstream tasks and generalises across datasets. Moreover, as the improvements in the model lead to better game play, this procedure can be applied iteratively. This work paves the way for self-improving VLMs, with potential applications in various real-world scenarios especially when the high-quality multimodal data is scarce.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Agent Systems and Negotiation · Semantic Web and Ontologies · Speech and dialogue systems
