Can Vision-Language Models Replace Human Annotators: A Case Study with   CelebA Dataset

Haoming Lu; Feifei Zhong

arXiv:2410.09416·cs.CV·October 15, 2024

Can Vision-Language Models Replace Human Annotators: A Case Study with CelebA Dataset

Haoming Lu, Feifei Zhong

PDF

Open Access

TL;DR

This study demonstrates that Vision-Language Models can achieve high-quality image annotations on CelebA at a fraction of the cost of manual annotation, showing promise as a scalable alternative.

Contribution

It provides empirical evidence that VLMs can replace human annotators for certain tasks, with improved consistency and significant cost savings.

Findings

01

VLM annotations agree 79.5% with human labels

02

Re-annotations increase agreement to 89.1%

03

AI annotation costs are less than 1% of manual costs

Abstract

This study evaluates the capability of Vision-Language Models (VLMs) in image data annotation by comparing their performance on the CelebA dataset in terms of quality and cost-effectiveness against manual annotation. Annotations from the state-of-the-art LLaVA-NeXT model on 1000 CelebA images are in 79.5% agreement with the original human annotations. Incorporating re-annotations of disagreed cases into a majority vote boosts AI annotation consistency to 89.1% and even higher for more objective labels. Cost assessments demonstrate that AI annotation significantly reduces expenditures compared to traditional manual methods -- representing less than 1% of the costs for manual annotation in the CelebA dataset. These findings support the potential of VLMs as a viable, cost-effective alternative for specific annotation tasks, reducing both financial burden and ethical concerns associated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Speech and dialogue systems · Topic Modeling