Scene Graph Modification Based on Natural Language Commands
Xuanli He, Quan Hung Tran, Gholamreza Haffari, Walter Chang, Trung, Bui, Zhe Lin, Franck Dernoncourt, Nhan Dam

TL;DR
This paper introduces a novel approach for updating scene graphs based on natural language commands, leveraging graph-based transformers and cross attention, and provides datasets for future research.
Contribution
It presents new models for scene graph modification using transformers and releases datasets, addressing a gap in direct graph manipulation from natural language commands.
Findings
Proposed models outperform previous systems in scene graph modification tasks.
Introduced large datasets for scene graph editing based on natural language commands.
Demonstrated the effectiveness of graph-based sparse transformers and cross attention.
Abstract
Structured representations like graphs and parse trees play a crucial role in many Natural Language Processing systems. In recent years, the advancements in multi-turn user interfaces necessitate the need for controlling and updating these structured representations given new sources of information. Although there have been many efforts focusing on improving the performance of the parsers that map text to graphs or parse trees, very few have explored the problem of directly manipulating these representations. In this paper, we explore the novel problem of graph modification, where the systems need to learn how to update an existing scene graph given a new user's command. Our novel models based on graph-based sparse transformer and cross attention information fusion outperform previous systems adapted from the machine translation and graph generation literature. We further contribute our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
MethodsLinear Layer · Cosine Annealing · Dense Connections · Layer Normalization · Multi-Head Attention · Dropout · Linear Warmup With Cosine Annealing · Attention Dropout · Weight Decay · Attention Is All You Need
