StableSemantics: A Synthetic Language-Vision Dataset of Semantic   Representations in Naturalistic Images

Rushikesh Zawar; Shaurya Dewan; Andrew F. Luo; Margaret M. Henderson,; Michael J. Tarr; Leila Wehbe

arXiv:2406.13735·cs.CV·June 21, 2024

StableSemantics: A Synthetic Language-Vision Dataset of Semantic Representations in Naturalistic Images

Rushikesh Zawar, Shaurya Dewan, Andrew F. Luo, Margaret M. Henderson,, Michael J. Tarr, Leila Wehbe

PDF

Open Access 1 Repo

TL;DR

StableSemantics introduces a large-scale synthetic dataset with semantic attributions, combining human prompts, captions, images, and attention maps to advance scene understanding in computer vision.

Contribution

The paper presents the first diffusion dataset with explicit semantic attributions, including human prompts, synthetic images, and attention maps, for improved visual semantic understanding.

Findings

01

Analyzed semantic distribution of generated images

02

Examined object distribution within images

03

Benchmarked captioning and segmentation methods

Abstract

Understanding the semantics of visual scenes is a fundamental challenge in Computer Vision. A key aspect of this challenge is that objects sharing similar semantic meanings or functions can exhibit striking visual differences, making accurate identification and categorization difficult. Recent advancements in text-to-image frameworks have led to models that implicitly capture natural scene statistics. These frameworks account for the visual variability of objects, as well as complex object co-occurrences and sources of noise such as diverse lighting conditions. By leveraging large-scale datasets and cross-attention conditioning, these models generate detailed and contextually rich scene representations. This capability opens new avenues for improving object recognition and scene understanding in varied and challenging environments. Our work presents StableSemantics, a dataset comprising…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rishidarkdevil/daam-i2i
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications

MethodsSoftmax · Attention Is All You Need · Diffusion