TL;DR
This paper introduces a method that models the context between objects to improve understanding of referring expressions, leveraging an LSTM and multiple-instance learning to identify relevant regions and their context.
Contribution
It presents a novel approach combining LSTM and MIL to incorporate object context, enhancing referring expression comprehension over previous property-only models.
Findings
Outperforms property-only models on RefExp datasets
Effectively grounds referring expressions to regions and context
Uses MIL for discovering context regions without explicit annotations
Abstract
Referring expressions usually describe an object using properties of the object and relationships of the object with other objects. We propose a technique that integrates context between objects to understand referring expressions. Our approach uses an LSTM to learn the probability of a referring expression, with input features from a region and a context region. The context regions are discovered using multiple-instance learning (MIL) since annotations for context objects are generally not available for training. We utilize max-margin based MIL objective functions for training the LSTM. Experiments on the Google RefExp and UNC RefExp datasets show that modeling context between objects provides better performance than modeling only object properties. We also qualitatively show that our technique can ground a referring expression to its referred region along with the supporting context…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
