GEMS: Scene Expansion using Generative Models of Graphs
Rishi Agarwal, Tirupati Saketh Chandra, Vaidehi Patil, Aniruddha, Mahapatra, Kuldeep Kulkarni, Vishwa Vinay

TL;DR
GEMS introduces a novel method for scene graph expansion that predicts new objects and relationships sequentially, leveraging external knowledge and new evaluation metrics, outperforming baseline models on standard datasets.
Contribution
The paper proposes a new scene graph expansion approach using sequential prediction, external knowledge, and novel metrics, improving over existing graph generation methods.
Findings
GEMS outperforms GraphRNN in representing scene graphs.
The method effectively incorporates external knowledge for better generalization.
New metrics provide comprehensive evaluation of predicted relationships.
Abstract
Applications based on image retrieval require editing and associating in intermediate spaces that are representative of the high-level concepts like objects and their relationships rather than dense, pixel-level representations like RGB images or semantic-label maps. We focus on one such representation, scene graphs, and propose a novel scene expansion task where we enrich an input seed graph by adding new nodes (objects) and the corresponding relationships. To this end, we formulate scene graph expansion as a sequential prediction task involving multiple steps of first predicting a new node and then predicting the set of relationships between the newly predicted node and previous nodes in the graph. We propose a sequencing strategy for observed graphs that retains the clustering patterns amongst nodes. In addition, we leverage external knowledge to train our graph generation model,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
GEMS: Scene Expansion using Generative Models of Graphs· youtube
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Visual Attention and Saliency Detection
