CollagePrompt: A Benchmark for Budget-Friendly Visual Recognition with GPT-4V
Siyu Xu, Yunke Wang, Daochang Liu, Bo Du, Chang Xu

TL;DR
This paper introduces CollagePrompt, a benchmark for cost-effective visual recognition using GPT-4V by collaging multiple images into a single prompt, and proposes optimization methods to improve accuracy and efficiency.
Contribution
It presents a new collage prompting benchmark and a genetic algorithm-based method to optimize image layouts for cheaper, accurate visual recognition with GPT-4V.
Findings
Recognition accuracy depends on image position within collage.
Grouping same-category images improves recognition.
Incorrect labels often originate from adjacent images.
Abstract
Recent advancements in generative AI have suggested that by taking visual prompts, GPT-4V can demonstrate significant proficiency in visual recognition tasks. Despite its impressive capabilities, the financial cost associated with GPT-4V's inference presents a substantial barrier to its wide use. To address this challenge, we propose a budget-friendly collage prompting task that collages multiple images into a single visual prompt and makes GPT-4V perform visual recognition on several images simultaneously, thereby reducing the cost. We collect a dataset of various collage prompts to assess its performance in GPT-4V's visual recognition. Our evaluations reveal several key findings: 1) Recognition accuracy varies with different positions in the collage. 2) Grouping images of the same category together leads to better visual recognition results. 3) Incorrect labels often come from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReservoir Engineering and Simulation Methods · Currency Recognition and Detection · Imbalanced Data Classification Techniques
