Context-Dependent Diffusion Network for Visual Relationship Detection
Zhen Cui, Chunyan Xu, Wenming Zheng, Jian Yang

TL;DR
This paper introduces a context-dependent diffusion network that leverages semantic and visual scene graphs to improve visual relationship detection, achieving state-of-the-art results by effectively modeling object interactions.
Contribution
The paper proposes a novel diffusion network framework utilizing semantic and visual graphs for enhanced visual relationship detection, addressing diversity and combinatorial challenges.
Findings
Outperforms existing methods on benchmark datasets.
Effectively models object interactions through graph-based diffusion.
Achieves state-of-the-art accuracy in visual relationship detection.
Abstract
Visual relationship detection can bridge the gap between computer vision and natural language for scene understanding of images. Different from pure object recognition tasks, the relation triplets of subject-predicate-object lie on an extreme diversity space, such as \textit{person-behind-person} and \textit{car-behind-building}, while suffering from the problem of combinatorial explosion. In this paper, we propose a context-dependent diffusion network (CDDN) framework to deal with visual relationship detection. To capture the interactions of different object instances, two types of graphs, word semantic graph and visual scene graph, are constructed to encode global context interdependency. The semantic graph is built through language priors to model semantic correlations across objects, whilst the visual scene graph defines the connections of scene objects so as to utilize the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
