Perception of Visual Content: Differences Between Humans and Foundation   Models

Nardiena A. Pratama; Shaoyang Fan; Gianluca Demartini

arXiv:2411.18968·cs.CV·April 29, 2025

Perception of Visual Content: Differences Between Humans and Foundation Models

Nardiena A. Pratama, Shaoyang Fan, Gianluca Demartini

PDF

Open Access

TL;DR

This paper compares human and machine-generated image annotations across diverse socio-economic contexts, revealing differences in perception, biases, and their effects on ML model performance, emphasizing the complementary roles of both annotation types.

Contribution

It provides a comprehensive analysis of how human and ML annotations differ in perception and bias, and evaluates their impact on model accuracy across socio-economic variables.

Findings

01

ML captions excel in region classification and income regression.

02

Human annotations are more effective for non-action categories.

03

Both annotation types are important; human annotations are not fully replaceable.

Abstract

Human-annotated content is often used to train machine learning (ML) models. However, recently, language and multi-modal foundational models have been used to replace and scale-up human annotator's efforts. This study explores the similarity between human-generated and ML-generated annotations of images across diverse socio-economic contexts (RQ1) and their impact on ML model performance and bias (RQ2). We aim to understand differences in perception and identify potential biases in content interpretation. Our dataset comprises images of people from various geographical regions and income levels, covering various daily activities and home environments. ML captions and human labels show highest similarity at a low-level, i.e., types of words that appear and sentence structures, but all annotations are consistent in how they perceive images across regions. ML Captions resulted in best…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsColor perception and design