ConvGenVisMo: Evaluation of Conversational Generative Vision Models
Narjes Nikzad Khasmakhi, Meysam Asgari-Chenaghlu, Nabiha Asghar,, Philipp Schaer, Dietlind Z\"uhlke

TL;DR
ConvGenVisMo introduces a comprehensive evaluation framework, dataset, and metrics for assessing conversational generative vision models, facilitating better understanding of their performance in realistic scenarios.
Contribution
This paper presents ConvGenVisMo, a new benchmark and evaluation suite specifically designed for assessing the performance of CGVMs in realistic settings.
Findings
New benchmark dataset for CGVM evaluation
Automated metrics for assessing visual and conversational outputs
Public availability of evaluation tools and dataset
Abstract
Conversational generative vision models (CGVMs) like Visual ChatGPT (Wu et al., 2023) have recently emerged from the synthesis of computer vision and natural language processing techniques. These models enable more natural and interactive communication between humans and machines, because they can understand verbal inputs from users and generate responses in natural language along with visual outputs. To make informed decisions about the usage and deployment of these models, it is important to analyze their performance through a suitable evaluation framework on realistic datasets. In this paper, we present ConvGenVisMo, a framework for the novel task of evaluating CGVMs. ConvGenVisMo introduces a new benchmark evaluation dataset for this task, and also provides a suite of existing and new automated evaluation metrics to evaluate the outputs. All ConvGenVisMo assets, including the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · AI in Service Interactions · Domain Adaptation and Few-Shot Learning
