TL;DR
This paper introduces ADARI, a large dataset of art images and descriptive sentences, to facilitate multimodal word sense disambiguation in creative practice, addressing the challenge of subjective image labeling.
Contribution
It provides a novel dataset and baseline methods for subjective image description and multimodal disambiguation in creative domains.
Findings
ADARI contains 240k images and 260k descriptions across multiple creative sub-domains.
BERT-based analysis reveals the complexity of subjective labels like 'dynamic'.
Multimodal approaches show promise for understanding ambiguous design language.
Abstract
Language is ambiguous; many terms and expressions can convey the same idea. This is especially true in creative practice, where ideas and design intents are highly subjective. We present a dataset, Ambiguous Descriptions of Art Images (ADARI), of contemporary workpieces, which aims to provide a foundational resource for subjective image description and multimodal word disambiguation in the context of creative practice. The dataset contains a total of 240k images labeled with 260k descriptive sentences. It is additionally organized into sub-domains of architecture, art, design, fashion, furniture, product design and technology. In subjective image description, labels are not deterministic: for example, the ambiguous label dynamic might correspond to hundreds of different images. To understand this complexity, we analyze the ambiguity and relevance of text with respect to images using the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Attention Dropout · Adam · Dense Connections · Linear Warmup With Linear Decay · Residual Connection · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Layer Normalization
