PPR-FCN: Weakly Supervised Visual Relation Detection via Parallel Pairwise R-FCN
Hanwang Zhang, Zawlin Kyaw, Jinyang Yu, Shih-Fu Chang

TL;DR
This paper introduces PPR-FCN, a novel weakly supervised model for detecting subject-predicate-object relations in images, efficiently handling large region pair sets with shared computation and a new score map for context.
Contribution
The paper proposes a parallel FCN architecture with pairwise RoI pooling and a position-role-sensitive score map for improved weakly supervised visual relation detection.
Findings
PPR-FCN outperforms baseline methods on relation detection benchmarks.
The model effectively captures context with the position-role-sensitive score map.
Shared computation reduces complexity in processing large region pairs.
Abstract
We aim to tackle a novel vision task called Weakly Supervised Visual Relation Detection (WSVRD) to detect "subject-predicate-object" relations in an image with object relation groundtruths available only at the image level. This is motivated by the fact that it is extremely expensive to label the combinatorial relations between objects at the instance level. Compared to the extensively studied problem, Weakly Supervised Object Detection (WSOD), WSVRD is more challenging as it needs to examine a large set of regions pairs, which is computationally prohibitive and more likely stuck in a local optimal solution such as those involving wrong spatial context. To this end, we present a Parallel, Pairwise Region-based, Fully Convolutional Network (PPR-FCN) for WSVRD. It uses a parallel FCN architecture that simultaneously performs pair selection and classification of single regions and region…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsMax Pooling · Convolution · Fully Convolutional Network
