Probabilistic Modeling of Semantic Ambiguity for Scene Graph Generation
Gengcong Yang, Jingyi Zhang, Yong Zhang, Baoyuan Wu, Yujiu Yang

TL;DR
This paper introduces a probabilistic uncertainty modeling approach for scene graph generation, capturing semantic ambiguity and enabling diverse, fine-grained relationship predictions, leading to improved performance on the Visual Genome benchmark.
Contribution
It proposes a novel PUM module that models visual regions as Gaussian distributions to handle semantic ambiguity and promote diverse relationship predictions.
Findings
Achieves state-of-the-art mean recall on Visual Genome
Effectively models semantic ambiguity with Gaussian distributions
Enhances diversity and fine-grained relationship coverage
Abstract
To generate "accurate" scene graphs, almost all existing methods predict pairwise relationships in a deterministic manner. However, we argue that visual relationships are often semantically ambiguous. Specifically, inspired by linguistic knowledge, we classify the ambiguity into three types: Synonymy Ambiguity, Hyponymy Ambiguity, and Multi-view Ambiguity. The ambiguity naturally leads to the issue of \emph{implicit multi-label}, motivating the need for diverse predictions. In this work, we propose a novel plug-and-play Probabilistic Uncertainty Modeling (PUM) module. It models each union region as a Gaussian distribution, whose variance measures the uncertainty of the corresponding visual content. Compared to the conventional deterministic methods, such uncertainty modeling brings stochasticity of feature representation, which naturally enables diverse predictions. As a byproduct, PUM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization
