What GPT Knows About Who is Who
Xiaohan Yang, Eduardo Peynetti, Vasco Meerman, Chris Tanner

TL;DR
This paper investigates the ability of large language models like GPT-2 and GPT-Neo to perform coreference resolution using prompt engineering, revealing their limited and inconsistent capabilities in identifying coreferent mentions.
Contribution
It introduces a QA-based prompt-engineering approach to assess LLMs' coreference resolution abilities, highlighting their limitations and sensitivity to prompts.
Findings
GPT-2 and GPT-Neo can produce valid answers
Their coreference identification is limited and inconsistent
Performance is highly prompt-sensitive
Abstract
Coreference resolution -- which is a crucial task for understanding discourse and language at large -- has yet to witness widespread benefits from large language models (LLMs). Moreover, coreference resolution systems largely rely on supervised labels, which are highly expensive and difficult to annotate, thus making it ripe for prompt engineering. In this paper, we introduce a QA-based prompt-engineering method and discern \textit{generative}, pre-trained LLMs' abilities and limitations toward the task of coreference resolution. Our experiments show that GPT-2 and GPT-Neo can return valid answers, but that their capabilities to identify coreferent mentions are limited and prompt-sensitive, leading to inconsistent results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Cosine Annealing · Weight Decay · Discriminative Fine-Tuning · Linear Warmup With Cosine Annealing · Softmax · Multi-Head Attention · Attention Dropout
