Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media Contexts

Jingxuan Li; Yuning Yang; Shengqi Yang; Linfan Zhang; Ying Nian Wu

arXiv:2411.11479·cs.CL·June 4, 2025

Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media Contexts

Jingxuan Li, Yuning Yang, Shengqi Yang, Linfan Zhang, Ying Nian Wu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces Value-Spectrum, a new benchmark for evaluating vision-language models on their understanding of human values and preferences through a large-scale video dataset and value-based questions.

Contribution

It presents a novel VQA benchmark based on Schwartz's value dimensions and a pipeline for simulating video browsing with diverse social media content.

Findings

01

VLMs show significant variation in handling value-oriented content.

02

VLMs can adopt specific personas when prompted.

03

Value-Spectrum effectively tracks VLM preferences in value-based tasks.

Abstract

The recent progress in Vision-Language Models (VLMs) has broadened the scope of multimodal applications. However, evaluations often remain limited to functional tasks, neglecting abstract dimensions such as personality traits and human values. To address this gap, we introduce Value-Spectrum, a novel Visual Question Answering (VQA) benchmark aimed at assessing VLMs based on Schwartz's value dimensions that capture core human values guiding people's preferences and actions. We design a VLM agent pipeline to simulate video browsing and construct a vector database comprising over 50,000 short videos from TikTok, YouTube Shorts, and Instagram Reels. These videos span multiple months and cover diverse topics, including family, health, hobbies, society, technology, etc. Benchmarking on Value-Spectrum highlights notable variations in how VLMs handle value-oriented content. Beyond identifying…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Jeremyyny/Value-Spectrum
none

Videos

Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media Contexts· underline

Taxonomy

TopicsVisual Attention and Saliency Detection

MethodsSparse Evolutionary Training · ADaptive gradient method with the OPTimal convergence rate · Focus