TL;DR
This paper introduces LOGIN, a novel scene graph generation framework that captures local and global interactions, explicitly models relational direction, and achieves state-of-the-art results on Visual Genome.
Contribution
The paper proposes a Local-to-Global Interaction Networks framework with direction-aware features and a new diagnostic task, improving scene graph generation accuracy.
Findings
LOGIN outperforms existing methods on Visual Genome benchmark.
LOGIN effectively distinguishes relational direction in scene graphs.
The Attract & Repel loss enhances predicate embedding quality.
Abstract
In this work, we seek new insights into the underlying challenges of the Scene Graph Generation (SGG) task. Quantitative and qualitative analysis of the Visual Genome dataset implies -- 1) Ambiguity: even if inter-object relationship contains the same object (or predicate), they may not be visually or semantically similar, 2) Asymmetry: despite the nature of the relationship that embodied the direction, it was not well addressed in previous studies, and 3) Higher-order contexts: leveraging the identities of certain graph elements can help to generate accurate scene graphs. Motivated by the analysis, we design a novel SGG framework, Local-to-Global Interaction Networks (LOGIN). Locally, interactions extract the essence between three instances of subject, object, and background, while baking direction awareness into the network by explicitly constraining the input order of subject and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttentive Walk-Aggregating Graph Neural Network
