Crowdsourcing of Real-world Image Annotation via Visual Properties

Xiaolei Diao; Fausto Giunchiglia

arXiv:2604.14449·cs.CV·April 17, 2026

Crowdsourcing of Real-world Image Annotation via Visual Properties

Xiaolei Diao, Fausto Giunchiglia

PDF

TL;DR

This paper presents a crowdsourcing framework that uses visual property constraints and an interactive question-based approach to improve real-world image annotation, addressing semantic gaps and reducing subjectivity.

Contribution

It introduces a novel interactive crowdsourcing method combining knowledge representation, NLP, and computer vision to enhance annotation quality and consistency.

Findings

01

Effective reduction of annotator subjectivity.

02

Improved annotation accuracy demonstrated in experiments.

03

Framework guides annotators via visual property-based questions.

Abstract

Recent advances in data-centric artificial intelligence highlight inherent limitations in object recognition datasets. One of the primary issues stems from the semantic gap problem, which results in complex many-to-many mappings between visual data and linguistic descriptions. This bias adversely affects performance in computer vision tasks. This paper proposes an image annotation methodology that integrates knowledge representation, natural language processing, and computer vision techniques, aiming to reduce annotator subjectivity by applying visual property constraints. We introduce an interactive crowdsourcing framework that dynamically asks questions based on a predefined object category hierarchy and annotator feedback, guiding image annotation by visual properties. Experiments demonstrate the effectiveness of this methodology, and annotator feedback is discussed to optimize the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.