GUMBridge: a Corpus for Varieties of Bridging Anaphora
Lauren Levine, Amir Zeldes

TL;DR
GUMBridge is a comprehensive, genre-diverse corpus for bridging anaphora in English, enabling better understanding and analysis of this complex linguistic phenomenon across various contexts.
Contribution
The paper introduces GUMBridge, a large, multi-genre corpus with detailed annotations for bridging anaphora, filling gaps in existing resources and supporting advanced NLP research.
Findings
Annotation quality is high and reliable.
Baseline models struggle with bridging resolution.
Subtype classification remains challenging for LLMs.
Abstract
Bridging is an anaphoric phenomenon where the referent of an entity in a discourse is dependent on a previous, non-identical entity for interpretation, such as in "There is 'a house'. 'The door' is red," where the door is specifically understood to be the door of the aforementioned house. While there are several existing resources in English for bridging anaphora, most are small, provide limited coverage of the phenomenon, and/or provide limited genre coverage. In this paper, we introduce GUMBridge, a new resource for bridging, which includes 16 diverse genres of English, providing both broad coverage for the phenomenon and granular annotations for the subtype categorization of bridging varieties. We also present an evaluation of annotation quality and report on baseline performance using open and closed source contemporary LLMs on three tasks underlying our data, showing that bridging…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Language, Metaphor, and Cognition · Syntax, Semantics, Linguistic Variation
