Assisting Scene Graph Generation with Self-Supervision
Sandeep Inuganti, Vineeth N Balasubramanian

TL;DR
This paper introduces three self-supervision tasks as auxiliary training objectives for scene graph generation, leveraging pre-trained object detectors to improve accuracy and distinguish relationship types, achieving state-of-the-art results.
Contribution
It proposes novel self-supervision tasks for scene graph generation that enhance model performance and relationship understanding without relying on additional annotations.
Findings
Achieved state-of-the-art results on Visual Genome dataset.
Improved distinction between geometric and possessive relationships.
Enhanced model performance using self-supervision with pre-trained detectors.
Abstract
Research in scene graph generation has quickly gained traction in the past few years because of its potential to help in downstream tasks like visual question answering, image captioning, etc. Many interesting approaches have been proposed to tackle this problem. Most of these works have a pre-trained object detection model as a preliminary feature extractor. Therefore, getting object bounding box proposals from the object detection model is relatively cheaper. We take advantage of this ready availability of bounding box annotations produced by the pre-trained detector. We propose a set of three novel yet simple self-supervision tasks and train them as auxiliary multi-tasks to the main model. While comparing, we train the base-model from scratch with these self-supervision tasks, we achieve state-of-the-art results in all the metrics and recall settings. We also resolve some of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
