Interaction-Centric Knowledge Infusion and Transfer for Open-Vocabulary Scene Graph Generation
Lin Li, Chuhan Zhang, Dong Zhang, Chong Sun, Chen Li, Long Chen

TL;DR
This paper introduces an interaction-centric framework for open-vocabulary scene graph generation that improves knowledge infusion and transfer by explicitly modeling interactions, leading to state-of-the-art results.
Contribution
It proposes a novel end-to-end interaction-driven OVSGG framework (ACC) that enhances knowledge infusion and transfer through interaction prompts, query selection, and knowledge distillation.
Findings
Achieves state-of-the-art performance on three benchmarks.
Effectively distinguishes interacting from non-interacting object instances.
Reduces noise and ambiguity in knowledge transfer processes.
Abstract
Open-vocabulary scene graph generation (OVSGG) extends traditional SGG by recognizing novel objects and relationships beyond predefined categories, leveraging the knowledge from pre-trained large-scale models. Existing OVSGG methods always adopt a two-stage pipeline: 1) \textit{Infusing knowledge} into large-scale models via pre-training on large datasets; 2) \textit{Transferring knowledge} from pre-trained models with fully annotated scene graphs during supervised fine-tuning. However, due to a lack of explicit interaction modeling, these methods struggle to distinguish between interacting and non-interacting instances of the same object category. This limitation induces critical issues in both stages of OVSGG: it generates noisy pseudo-supervision from mismatched objects during knowledge infusion, and causes ambiguous query matching during knowledge transfer. To this end, in this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Advanced Graph Neural Networks
