Knowledge-augmented Few-shot Visual Relation Detection

Tianyu Yu; Yangning Li; Jiaoyan Chen; Yinghui Li; Hai-Tao Zheng; Xi; Chen; Qingbin Liu; Wenqiang Liu; Dongxiao Huang; Bei Wu; Yexin Wang

arXiv:2303.05342·cs.CV·March 10, 2023·5 cites

Knowledge-augmented Few-shot Visual Relation Detection

Tianyu Yu, Yangning Li, Jiaoyan Chen, Yinghui Li, Hai-Tao Zheng, Xi, Chen, Qingbin Liu, Wenqiang Liu, Dongxiao Huang, Bei Wu, Yexin Wang

PDF

Open Access

TL;DR

This paper introduces a knowledge-augmented few-shot visual relation detection framework that leverages textual and visual relation knowledge to significantly improve generalization and outperform existing models on standard benchmarks.

Contribution

The paper proposes a novel framework combining textual and visual relation knowledge to enhance few-shot VRD performance, addressing generalization issues of prior methods.

Findings

01

Outperforms state-of-the-art models on three Visual Genome benchmarks.

02

Significant improvement in generalization ability for few-shot VRD.

03

Effectiveness validated through extensive experiments.

Abstract

Visual Relation Detection (VRD) aims to detect relationships between objects for image understanding. Most existing VRD methods rely on thousands of training samples of each relationship to achieve satisfactory performance. Some recent papers tackle this problem by few-shot learning with elaborately designed pipelines and pre-trained word vectors. However, the performance of existing few-shot VRD models is severely hampered by the poor generalization capability, as they struggle to handle the vast semantic diversity of visual relationships. Nonetheless, humans have the ability to learn new relationships with just few examples based on their knowledge. Inspired by this, we devise a knowledge-augmented, few-shot VRD framework leveraging both textual knowledge and visual relation knowledge to improve the generalization ability of few-shot VRD. The textual knowledge and visual relation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques