The ApposCorpus: A new multilingual, multi-domain dataset for factual appositive generation
Yova Kementchedjhieva, Di Lu, Joel Tetreault

TL;DR
This paper introduces the ApposCorpus, a multilingual, multi-domain dataset for generating appositive noun phrases that provide background information about named entities, highlighting the task's complexity and room for improvement.
Contribution
The paper presents a new, realistic dataset for appositive generation across multiple languages, domains, and entity types, expanding previous work and analyzing the task's challenges.
Findings
Standard language generation methods perform poorly on the task.
The dataset reveals significant modeling challenges.
The task remains non-trivial with ample room for future research.
Abstract
News articles, image captions, product reviews and many other texts mention people and organizations whose name recognition could vary for different audiences. In such cases, background information about the named entities could be provided in the form of an appositive noun phrase, either written by a human or generated automatically. We expand on the previous work in appositive generation with a new, more realistic, end-to-end definition of the task, instantiated by a dataset that spans four languages (English, Spanish, German and Polish), two entity types (person and organization) and two domains (Wikipedia and News). We carry out an extensive analysis of the data and the task, pointing to the various modeling challenges it poses. The results we obtain with standard language generation methods show that the task is indeed non-trivial, and leaves plenty of room for improvement.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
