Constructing a Visual Relationship Authenticity Dataset
Chenhui Chu, Yuto Takebayashi, Mishra Vipul, Yuta Nakashima

TL;DR
This paper introduces a new dataset that annotates both true and false visual relationships in images, aiming to improve scene understanding and grounded language processing.
Contribution
The creation of a comprehensive dataset with true and false visual relationships, filling a gap in existing datasets for better scene and language understanding.
Findings
Dataset includes true and false relationships among objects.
Facilitates research in distinguishing correct and incorrect visual relationships.
Supports advancements in vision and language understanding.
Abstract
A visual relationship denotes a relationship between two objects in an image, which can be represented as a triplet of (subject; predicate; object). Visual relationship detection is crucial for scene understanding in images. Existing visual relationship detection datasets only contain true relationships that correctly describe the content in an image. However, distinguishing false visual relationships from true ones is also crucial for image understanding and grounded natural language processing. In this paper, we construct a visual relationship authenticity dataset, where both true and false relationships among all objects appeared in the captions in the Flickr30k entities image caption dataset are annotated. The dataset is available at https://github.com/codecreator2053/VR_ClassifiedDataset. We hope that this dataset can promote the study on both vision and language understanding.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Human Pose and Action Recognition
