VCD: A Dataset for Visual Commonsense Discovery in Images

Xiangqing Shen; Fanfan Wang; Siwei Wu; Rui Xia

arXiv:2402.17213·cs.CV·June 6, 2025·1 cites

VCD: A Dataset for Visual Commonsense Discovery in Images

Xiangqing Shen, Fanfan Wang, Siwei Wu, Rui Xia

PDF

Open Access 1 Video

TL;DR

VCD is a large-scale dataset that provides structured visual commonsense knowledge across images, enabling improved reasoning about unseen and observable aspects of visual scenes.

Contribution

The paper introduces VCD, a novel dataset with a three-level taxonomy for visual commonsense, and a generative model VCM for discovering diverse visual commonsense.

Findings

01

VCD contains over 100,000 images and 14 million object-commonsense pairs.

02

VCD's taxonomy covers Seen and Unseen commonsense in Property, Action, and Space.

03

VCM effectively discovers diverse visual commonsense from images.

Abstract

Visual commonsense plays a vital role in understanding and reasoning about the visual world. While commonsense knowledge bases like ConceptNet provide structured collections of general facts, they lack visually grounded representations. Scene graph datasets like Visual Genome, though rich in object-level descriptions, primarily focus on directly observable information and lack systematic categorization of commonsense knowledge. We present Visual Commonsense Dataset (VCD), a large-scale dataset containing over 100,000 images and 14 million object-commonsense pairs that bridges this gap. VCD introduces a novel three-level taxonomy for visual commonsense, integrating both Seen (directly observable) and Unseen (inferrable) commonsense across Property, Action, and Space aspects. Each commonsense is represented as a triple where the head entity is grounded to object bounding boxes in images,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

VCD: A Dataset for Visual Commonsense Discovery in Images· underline

Taxonomy

TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization

MethodsBalanced Selection