Who are you referring to? Coreference resolution in image narrations

Arushi Goel; Basura Fernando; Frank Keller; Hakan Bilen

arXiv:2211.14563·cs.CV·March 20, 2023·1 cites

Who are you referring to? Coreference resolution in image narrations

Arushi Goel, Basura Fernando, Frank Keller, Hakan Bilen

PDF

Open Access 1 Video

TL;DR

This paper introduces a new dataset and a weakly supervised model for coreference resolution in long image narrations, demonstrating improved performance and enhanced scene grounding capabilities.

Contribution

It presents a novel dataset with annotated coreference chains in image narrations and a weak supervision method leveraging linguistic priors for coreference resolution.

Findings

01

Model outperforms strong baselines in coreference resolution

02

Coreference resolution improves image grounding accuracy

03

New dataset enables better training and evaluation

Abstract

Coreference resolution aims to identify words and phrases which refer to same entity in a text, a core task in natural language processing. In this paper, we extend this task to resolving coreferences in long-form narrations of visual scenes. First we introduce a new dataset with annotated coreference chains and their bounding boxes, as most existing image-text datasets only contain short sentences without coreferring expressions or labeled chains. We propose a new technique that learns to identify coreference chains using weak supervision, only from image-text pairs and a regularization using prior linguistic knowledge. Our model yields large performance gains over several strong baselines in resolving coreferences. We also show that coreference resolution helps improving grounding narratives in images.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Who Are You Referring To? Coreference Resolution In Image Narrations· youtube

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling