TL;DR
This paper introduces a novel visual distant supervision paradigm for scene graph generation that leverages knowledge bases to automatically create large-scale labeled data, reducing reliance on human annotations and outperforming existing methods.
Contribution
The work proposes a new distant supervision framework for visual relation learning that automatically generates labeled data and effectively reduces noise, achieving superior performance.
Findings
Outperforms weakly and semi-supervised baselines
Achieves significant improvements over fully supervised models
Demonstrates effectiveness of knowledge-base aligned distant supervision
Abstract
Scene graph generation aims to identify objects and their relations in images, providing structured image representations that can facilitate numerous applications in computer vision. However, scene graph models usually require supervised learning on large quantities of labeled data with intensive human annotation. In this work, we propose visual distant supervision, a novel paradigm of visual relation learning, which can train scene graph models without any human-labeled data. The intuition is that by aligning commonsense knowledge bases and images, we can automatically create large-scale labeled data to provide distant supervision for visual relation learning. To alleviate the noise in distantly labeled data, we further propose a framework that iteratively estimates the probabilistic relation labels and eliminates the noisy ones. Comprehensive experimental results show that our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
