One-shot Scene Graph Generation

Yuyu Guo; Jingkuan Song; Lianli Gao; Heng Tao Shen

arXiv:2202.10824·cs.CV·March 1, 2022·5 cites

One-shot Scene Graph Generation

Yuyu Guo, Jingkuan Song, Lianli Gao, Heng Tao Shen

PDF

Open Access 1 Repo

TL;DR

This paper introduces a one-shot scene graph generation method that leverages rich prior knowledge and advanced encoding techniques to significantly outperform existing models in generating visual relationship graphs from minimal data.

Contribution

It proposes a novel approach combining multiple structured knowledge and a transformer encoder to enable effective one-shot scene graph generation.

Findings

01

Outperforms state-of-the-art methods by a large margin on a constructed dataset.

02

Demonstrates the effectiveness of combining relational and commonsense knowledge.

03

Validates the benefits of using an Instance Relation Transformer encoder.

Abstract

As a structured representation of the image content, the visual scene graph (visual relationship) acts as a bridge between computer vision and natural language processing. Existing models on the scene graph generation task notoriously require tens or hundreds of labeled samples. By contrast, human beings can learn visual relationships from a few or even one example. Inspired by this, we design a task named One-Shot Scene Graph Generation, where each relationship triplet (e.g., "dog-has-head") comes from only one labeled example. The key insight is that rather than learning from scratch, one can utilize rich prior knowledge. In this paper, we propose Multiple Structured Knowledge (Relational Knowledge and Commonsense Knowledge) for the one-shot scene graph generation task. Specifically, the Relational Knowledge represents the prior knowledge of relationships between entities extracted…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gyy8426/os-sgg
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques

MethodsAttention Is All You Need · Linear Layer · Residual Connection · Layer Normalization · RoIPool · Multi-Head Attention · Convolution · Adam · Label Smoothing · Position-Wise Feed-Forward Layer