One-shot Scene Graph Generation
Yuyu Guo, Jingkuan Song, Lianli Gao, Heng Tao Shen

TL;DR
This paper introduces a one-shot scene graph generation method that leverages rich prior knowledge and advanced encoding techniques to significantly outperform existing models in generating visual relationship graphs from minimal data.
Contribution
It proposes a novel approach combining multiple structured knowledge and a transformer encoder to enable effective one-shot scene graph generation.
Findings
Outperforms state-of-the-art methods by a large margin on a constructed dataset.
Demonstrates the effectiveness of combining relational and commonsense knowledge.
Validates the benefits of using an Instance Relation Transformer encoder.
Abstract
As a structured representation of the image content, the visual scene graph (visual relationship) acts as a bridge between computer vision and natural language processing. Existing models on the scene graph generation task notoriously require tens or hundreds of labeled samples. By contrast, human beings can learn visual relationships from a few or even one example. Inspired by this, we design a task named One-Shot Scene Graph Generation, where each relationship triplet (e.g., "dog-has-head") comes from only one labeled example. The key insight is that rather than learning from scratch, one can utilize rich prior knowledge. In this paper, we propose Multiple Structured Knowledge (Relational Knowledge and Commonsense Knowledge) for the one-shot scene graph generation task. Specifically, the Relational Knowledge represents the prior knowledge of relationships between entities extracted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
MethodsAttention Is All You Need · Linear Layer · Residual Connection · Layer Normalization · RoIPool · Multi-Head Attention · Convolution · Adam · Label Smoothing · Position-Wise Feed-Forward Layer
