Generating Triples with Adversarial Networks for Scene Graph Construction
Matthew Klawonn, Eric Heim

TL;DR
This paper introduces a novel adversarial network-based method for generating detailed scene graphs from images, capturing object relationships and attributes without requiring bounding box labels, thus advancing scene understanding in computer vision.
Contribution
The paper presents a new GAN-based approach that generates scene graphs with attributes from images without bounding box supervision, improving on prior methods.
Findings
Outperforms previous models on standard datasets
Handles larger vocabulary sizes effectively
Produces scene graphs with attribute information without bounding boxes
Abstract
Driven by successes in deep learning, computer vision research has begun to move beyond object detection and image classification to more sophisticated tasks like image captioning or visual question answering. Motivating such endeavors is the desire for models to capture not only objects present in an image, but more fine-grained aspects of a scene such as relationships between objects and their attributes. Scene graphs provide a formal construct for capturing these aspects of an image. Despite this, there have been only a few recent efforts to generate scene graphs from imagery. Previous works limit themselves to settings where bounding box information is available at train time and do not attempt to generate scene graphs with attributes. In this paper we propose a method, based on recent advancements in Generative Adversarial Networks, to overcome these deficiencies. We take the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
