Taking A Closer Look at Interacting Objects: Interaction-Aware Open Vocabulary Scene Graph Generation
Lin Li, Chuhan Zhang, Dong Zhang, Chong Sun, Chen Li and, Long Chen

TL;DR
This paper introduces INOVA, an interaction-aware framework for open vocabulary scene graph generation that explicitly models object interactions to improve relation prediction and overall performance.
Contribution
INOVA is the first to incorporate explicit interaction modeling in open vocabulary scene graph generation, improving relation accuracy and robustness.
Findings
Achieves state-of-the-art results on VG and GQA benchmarks.
Effectively distinguishes interacting objects from non-interacting ones.
Enhances robustness through interaction-consistent knowledge distillation.
Abstract
Today's open vocabulary scene graph generation (OVSGG) extends traditional SGG by recognizing novel objects and relationships beyond predefined categories, leveraging the knowledge from pre-trained large-scale models. Most existing methods adopt a two-stage pipeline: weakly supervised pre-training with image captions and supervised fine-tuning (SFT) on fully annotated scene graphs. Nonetheless, they omit explicit modeling of interacting objects and treat all objects equally, resulting in mismatched relation pairs. To this end, we propose an interaction-aware OVSGG framework INOVA. During pre-training, INOVA employs an interaction-aware target generation strategy to distinguish interacting objects from non-interacting ones. In SFT, INOVA devises an interaction-guided query selection tactic to prioritize interacting objects during bipartite graph matching. Besides, INOVA is equipped with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
