Integrating Object-aware and Interaction-aware Knowledge for Weakly Supervised Scene Graph Generation
Xingchen Li, Long Chen, Wenbo Ma, Yi Yang, Jun Xiao

TL;DR
This paper proposes a novel approach for Weakly Supervised Scene Graph Generation that integrates object-aware and interaction-aware knowledge to improve pseudo label quality and overall performance.
Contribution
It introduces a dual-teacher grounding framework that combines object and interaction knowledge with adaptive weighting for better weakly supervised scene graph generation.
Findings
Significant performance improvements on various weak supervision benchmarks.
Effective fusion of object and interaction knowledge enhances pseudo label reliability.
Adaptive weighting strategies improve training guidance for the grounding module.
Abstract
Recently, increasing efforts have been focused on Weakly Supervised Scene Graph Generation (WSSGG). The mainstream solution for WSSGG typically follows the same pipeline: they first align text entities in the weak image-level supervisions (e.g., unlocalized relation triplets or captions) with image regions, and then train SGG models in a fully-supervised manner with aligned instance-level "pseudo" labels. However, we argue that most existing WSSGG works only focus on object-consistency, which means the grounded regions should have the same object category label as text entities. While they neglect another basic requirement for an ideal alignment: interaction-consistency, which means the grounded region pairs should have the same interactions (i.e., visual relations) as text entity pairs. Hence, in this paper, we propose to enhance a simple grounding module with both object-aware and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Topic Modeling
MethodsALIGN
