Reasoning about Fine-grained Attribute Phrases using Reference Games
Jong-Chyi Su, Chenyun Wu, Huaizu Jiang, Subhransu Maji

TL;DR
This paper introduces a framework using reference games to learn and ground fine-grained attribute phrases for describing visual differences, enabling better image retrieval and interpretability across categories.
Contribution
It proposes a novel reference game approach to learn compositional attribute phrases and demonstrates improved image retrieval and interpretability for fine-grained categories.
Findings
20% improvement in image retrieval accuracy
Ability to interpret unseen descriptions
Effective grounding of attribute phrases
Abstract
We present a framework for learning to describe fine-grained visual differences between instances using attribute phrases. Attribute phrases capture distinguishing aspects of an object (e.g., "propeller on the nose" or "door near the wing" for airplanes) in a compositional manner. Instances within a category can be described by a set of these phrases and collectively they span the space of semantic attributes for a category. We collect a large dataset of such phrases by asking annotators to describe several visual differences between a pair of instances within a category. We then learn to describe and ground these phrases to images in the context of a *reference game* between a speaker and a listener. The goal of a speaker is to describe attributes of an image that allows the listener to correctly identify it within a pair. Data collected in a pairwise manner improves the ability of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
