Using Syntax to Ground Referring Expressions in Natural Images
Volkan Cirik, Taylor Berg-Kirkpatrick, Louis-Philippe Morency

TL;DR
GroundNet is a novel neural network that uses syntactic analysis of referring expressions to improve object localization and interpretability in natural images, achieving state-of-the-art results.
Contribution
This paper introduces GroundNet, the first model to incorporate syntactic parse trees into the architecture for referring expression grounding.
Findings
GroundNet outperforms previous methods in supporting object localization.
It maintains comparable accuracy in target object localization.
The approach enhances interpretability by linking phrases to image objects.
Abstract
We introduce GroundNet, a neural network for referring expression recognition -- the task of localizing (or grounding) in an image the object referred to by a natural language expression. Our approach to this task is the first to rely on a syntactic analysis of the input referring expression in order to inform the structure of the computation graph. Given a parse tree for an input expression, we explicitly map the syntactic constituents and relationships present in the tree to a composed graph of neural modules that defines our architecture for performing localization. This syntax-based approach aids localization of \textit{both} the target object and auxiliary supporting objects mentioned in the expression. As a result, GroundNet is more interpretable than previous methods: we can (1) determine which phrase of the referring expression points to which object in the image and (2) track…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Natural Language Processing Techniques
