NNE: A Dataset for Nested Named Entity Recognition in English Newswire

Nicky Ringland; Xiang Dai; Ben Hachey; Sarvnaz Karimi and; Cecile Paris; James R. Curran

arXiv:1906.01359·cs.CL·June 5, 2019

NNE: A Dataset for Nested Named Entity Recognition in English Newswire

Nicky Ringland, Xiang Dai, Ben Hachey, Sarvnaz Karimi and, Cecile Paris, James R. Curran

PDF

TL;DR

This paper introduces NNE, a comprehensive nested named entity dataset for English newswire, designed to facilitate the development of advanced nested NER techniques by providing detailed multi-layered annotations.

Contribution

The paper presents a large, fine-grained nested NER dataset with extensive annotations, addressing the lack of resources for nested entity recognition in English newswire.

Findings

01

Contains 279,795 mentions of 114 entity types

02

Supports up to 6 layers of nesting

03

Aims to promote new nested NER methods

Abstract

Named entity recognition (NER) is widely used in natural language processing applications and downstream tasks. However, most NER tools target flat annotation from popular datasets, eschewing the semantic information available in nested entity mentions. We describe NNE---a fine-grained, nested named entity dataset over the full Wall Street Journal portion of the Penn Treebank (PTB). Our annotation comprises 279,795 mentions of 114 entity types with up to 6 layers of nesting. We hope the public release of this large dataset for English newswire will encourage development of new techniques for nested NER.

Tables4

Table 1. Table 1: Number of spans at each layer of nesting with their most frequent categories.

Depth	Number	%	Three most frequent categories
1	118,525	45.5	corp (22,752), date (15,927), per (13,460)
2	106,144	40.8	cardinal (19,834), name (18,640), unit (14,871)
3	31,573	12.1	cardinal (11,697), mult (5,859), name (3,450)
4	3,813	1.5	cardinal (1,650), mult (1,041), unit (400)
5	327	0.1	cardinal (154), mult (96), unit (51)
6	4	0.0	unit (1), city-state (1), mult (1)

Table 2. Table 2: NER results on NNE using different methods.

	$P$	$R$	$F_{1}$
BiLSTM-CRF-TOP	89.9	38.0	53.5
BiLSTM-CRF-BOTTOM	93.8	62.0	74.7
BiLSTM-CRF-BOTH	92.2	85.8	88.9
Hypergraph	91.8	91.0	91.4
Transition	77.4	70.1	73.6

Table 3. Table 3: A comparison between NNE and two commonly used corpora with nested entities.

w. nesting	32,387	9,533	4,266
Item	NNE	GENIA	ACE2005
Documents	2,312	2,000	464
Sentences	49,208	18,546	12,548
Sentences	32,387	9,533	4,266
Tokens	1.1M	0.5M	0.3M
Mentions	279,795	92,681	30,966
Entity types	114	36	7
Mentions	5.69	4.99	2.46
per sentence	5.69	4.99	2.46
Top-level mentions	118,525	76,582	23,464
Maximum depth	6	4	6

Category	Frequency	Category	Frequency	Category	Frequency
CARDINAL	43873	STREET	475	QUANTITY2D	81
NAME	28537	GRPORG	437	PRODUCTFOOD	80
ORGCORP	23339	ORGPOLITICAL	436	SUBURB	78
UNIT	19289	VEHICLE	432	GRPLOC	63
DATE	17381	LAW	419	HOTEL	55
PER	14960	ORGEDU	411	QUANTITYOTHER	55
DURATION	13655	CONTINENT	354	FUND	54
MONEY	12640	BUILDING	346	SONG	54
MULT	7851	SEASON	337	SPACE	53
FIRST	6797	GPE	333	RIVER	52
CITY	6723	FOLD	313	WAR	51
PERCENT	6542	MIDDLE	313	CHEMICAL	45
REL	6170	TIME	296	BRIDGE	44
CORPJARGON	5560	WEIGHT	293	PLAY	42
HON	5524	OCEAN	291	STADIUM	37
NATIONALITY	5193	LOCATIONOTHER	261	AWARD	36
GOVERNMENT	4674	EVENT	260	ORGRELIGIOUS	35
COUNTRY	4047	DISEASE	246	AIRPORT	32
QUAL	3903	QUANTITY1D	220	ANIMATE	29
YEAR	3421	CITYSTATE	220	GOD	29
MONTH	3385	WOA	207	HOSPITAL	25
STATE	3245	TVSHOW	172	ATTRACTION	24
ORDINAL	2590	ELECTRONICS	167	WEAPON	23
IPOINTS	2395	SPORTSTEAM	166	MUSEUM	17
ROLE	2368	DATEOTHER	164	ENERGY	17
RATE	2141	QUANTITY3D	156	SPEED	14
MEDIA	1712	NAMEMOD	155	PAINTING	13
DAY	1631	GRPPER	154	BAND	10
NUMDAY	1495	BOOK	149	SPORTSSEASON	8
INI	1445	ARMY	139	SCINAME	7
NORPOTHER	1247	FACILITY	129	ADDRESSNON	3
ORGOTHER	1099	PRODUCTDRUG	116	ALBUM	3
PERIODIC	1066	HURRICANE	107	TEMPERATURE	2
REGION	864	SPORTSEVENT	100	NATURALDISASTER	2
NORPPOLITICAL	731	RELIGION	99	CONCERT	2
AGE	661	NICKNAME	96	STATION	1
INDEX	657	LANGUAGE	92	BORDER	1
PRODUCTOTHER	656	FILM	89	CHANNEL	1

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

NNE: A Dataset for Nested Named Entity Recognition in English Newswire

Nicky Ringland ${}^{1}_{{\dagger}}$ Xiang Dai ${}^{1,2}_{{\ddagger}}$ Ben Hachey1,3 Sarvnaz Karimi2

Cecile Paris2 James R. Curran1

1University of Sydney, Sydney, Australia

2CSIRO Data61, Sydney, Australia

3Digital Health CRC, Sydney, Australia

${\dagger}$ [email protected] ${\ddagger}$ [email protected]

Abstract

Named entity recognition (NER) is widely used in natural language processing applications and downstream tasks. However, most NER tools target flat annotation from popular datasets, eschewing the semantic information available in nested entity mentions. We describe NNE—a fine-grained, nested named entity dataset over the full Wall Street Journal portion of the Penn Treebank (PTB). Our annotation comprises 279,795 mentions of 114 entity types with up to 6 layers of nesting. We hope the public release of this large dataset for English newswire will encourage development of new techniques for nested NER.

1 Introduction

Named entity recognition—the task of identifying and classifying entity mentions in text—plays a crucial role in understanding natural language. It is used for many downstream language processing tasks, e.g., coreference resolution, question answering, summarization, entity linking, relation extraction and knowledge base population. However, most NER tools are designed to capture flat mention structure over coarse entity type schemas, reflecting the available annotated datasets.

Focusing on flat mention structures ignores important information that can be useful for downstream tasks. Figure 1 includes examples of nested named entities illustrating several phenomena:

•

Entity-entity relationships can be embedded in nested mentions. For instance, the location of the ‘Ontario Supreme Court’ is indicated by the embedded state mention ‘Ontario’;

•

Entity attribute values can be embedded in nested mentions. For instance, the title is the embedded missing‘Former U.N. Ambassador’, which also encodes the employment relation between the person ‘Jane Kirkpatrick‘ and org ‘U.N.’;

•

Part-whole relationships can be encoded in nested mention structure. For instance, the region ‘Southern California’ is part of the state ‘California’.

Recent work has demonstrated increasing interest in nested entity structure, including local approaches (Xu et al., 2017; Sohrab and Miwa, 2018), hypergraph-based approaches (Lu and Roth, 2015; Muis and Lu, 2017; Katiyar and Cardie, 2018; Wang and Lu, 2018), cascaded approaches (Alex et al., 2007; Ju et al., 2018), and parsing approaches (Finkel and Manning, 2009; Wang et al., 2018). See Dai (2018) for a survey. Yet these techniques have seen little translation from the research literature to toolsets or downstream applications.

To facilitate ongoing research on nested NER, we introduce NNE—a large, manually-annotated, nested named entity dataset over English newswire. This new annotation layer over the Wall Street Journal portion of the PTB includes 279,795 mentions. All mentions are annotated, including nested structures with depth as high as six layers. A fine-grained entity type schema is used, extending the flat BBN (Weischedel and Brunstein, 2005) annotation from 64 to 114 entity types.

We are publicly releasing the standoff annotations along with detailed annotation guidelines and scripts for knitting annotations onto the underlying PTB corpus.111https://github.com/nickyringland/nested_named_entities Benchmark results using recent state-of-the-art approaches demonstrate that good accuracy is possible, but complexity and run time are open challenges. As a new layer over the already rich collection of PTB annotations, NNE provides an opportunity to explore joint modelling of nested NER and other tasks at an unprecedented scale and detail.

2 The NNE dataset

Annotation Scheme:

BBN (Weischedel and Brunstein, 2005) is a pronoun coreference and entity type corpus, annotated with 64 types of entities, numerical and time expressions. We use its flat entity schema as a starting point to design our schema. We analyzed existing BBN annotations to develop and automatically apply structured pre-annotation for predictable entity types. Additional fine-grained categories and further structural elements of entities, inspired by Sekine et al. (2002) and Nothman et al. (2013), are used to augment the BBN schema. We adhere to the following general principles when annotating nested named entities in the corpus:

•

Annotate all named entities, all time and date (timex) and numerical (numex) entities, including all non-sentence initial words in title case, and instances of proper noun mentions that are not capitalized.

•

Annotate all structural elements of entities. These elements could be other entities, such as ‘Ontario’ (state) in ‘Ontario Supreme Court’ (government), or structural components such as ‘40’ (cardinal) and ‘miles’ (unit) in ‘40 miles’ (quantity:1d), as well as the internal structure induced by syntactic elements, such as coordination.

•

Add consistent substructure to avoid spurious ambiguity. For example, the token ‘Toronto’, which is a city, would be labeled as part of an org:edu organization span ‘University of Toronto’. We add layers of annotations to allow each token to be annotated as consistently as possible, e.g., [University of [Toronto] ${}_{\textsc{{\color[rgb]{0.74609375,0.00390625,0.09765625}{city}}}}$ ] ${}_{\textsc{{\color[rgb]{0.12890625,0.21875,0.671875}{org:edu}}}}$ .

•

Add additional categories to avoid category confusion. Some entities are easy to identify, but difficult to categorize consistently. For instance, a hotel (or any business at a fixed location) has both organizational and locative qualities, or is at least treated metonymously as a location. Rather than requiring annotators to make an ambiguous decision, we elect to add category hotel to simplify the individual annotation decision. We also apply this principle when adding media, fund, and building categories.

•

Pragmatic annotation. Many annotation decisions are ambiguous and difficult, thus may require substantial research. For instance, knowing that ‘The Boeing Company’ was named after founder ‘William E. Boeing’ would allow us to annotate ‘Boeing’ with an embedded person entity. However, this does not apply for other companies, such as ‘Sony Corporation’. To let annotation decisions be made without reference to external knowledge, we label all tokens that seem to be the names of people as name, regardless of whether they are actually a person’s name.

Entity types and mention frequencies can be found in Appendix A. See Ringland (2016) for annotation guidelines and extended discussion of annotation decisions.

Annotation Process:

Although some existing annotation tools allow nested structures (e.g., Brat (Stenetorp et al., 2012)), we built a custom tool that allowed us to create a simple and fast way to add layers of entities, and suggest reusing existing structured annotations for the same span.

Using the annotations from BBN as underlying annotations, the annotator is shown a screen with the target sentence, as well as the previous and next sentences, if any. A view of the whole article is also possible to help the annotator with contextual cues. When annotators select a span, they are prompted with suggestions based on their own previous annotations, and common entities. Some entities are repeated frequently in an article, or over many articles in the corpus. The annotation tool allows a user to add a specified annotation to all strings matching those tokens in the same article, or in all articles.

Four annotators, each with a background in linguistics and/or computational linguistics were selected and briefed on the annotation task and purpose. The WSJ portion of the PTB consists of 25 sections (00–24). Each annotator started with a subset of section 00 as annotation training, and was given feedback before moving on to other sections. Weekly meetings were held with all annotators to discuss ambiguities in the guidelines, gaps in the annotation categories, edge cases and ambiguous entities and to resolve discrepancies.

Total annotation time for the corpus was 270 hours, split between the four annotators. Sections 00 and 23 were doubly annotated, and section 02 was annotated by all four annotators. An additional 17 hours was used for adjudicating these sections annotated by multiple annotators.

Dataset Analysis:

The resulting NNE dataset includes a large number of entity mentions of substantial depth, with more than half of mentions occurring inside another mentions. Of the 118,525 top-level entity mentions, 47,020 (39.6%) do not have any nested structure embedded. The remaining 71,505 mentions contain 161,270 mentions, averaging 2.25 structural mentions per each of these top-layer entity mentions. Note that one span can be assigned multiple entity types. For example, the span ‘1993’ can be annotated as both date and year. In NNE, 19,144 out of 260,386 total spans are assigned multiple types. Table 1 lists the number of spans occurring at each depth. To measure how clearly the annotation guidelines delineate each category, and how reliable our annotations are, inter-annotator agreement was calculated using annotations on Section 02, which was annotated by all four annotators. An adjudicated version was created by deciding a correct existing candidate label from within the four possibilities, or by adjusting one of them on a token level. For the purposes of inter-annotator agreement, a tag stack is calculated for each word, essentially flattening each token’s nested annotation structure into one label. For example, the tag of token ‘California’ in the third sentence of Figure 1 is state_region, while ‘beach’ is O_O. Agreement using Fleiss’ kappa over all tokens is 0.907. Considering only tokens that are part of at least one mention according to at least one annotator, Fleiss’ kappa is 0.832. Both results are above the 0.8 threshold for good reliability (Carletta, 1996). Average precision, recall and $F_{1}$ score across four annotators with respect to the adjudicated gold standard are 94.3, 91.8 and 93.0.

3 Benchmark results

We evaluate three existing NER models on our dataset: (1) the standard BiLSTM-CRF model which can handle only flat entities (Lample et al., 2016); (2) hypergraph-based (Wang and Lu, 2018); and, (3) transition-based (Wang et al., 2018) models. The latter two models were proposed to recognize nested mentions. We follow CoNLL evaluation schema in requiring an exact match of mention start, end and entity type (Sang and Meulder, 2003). We use sections 02 as development set, sections 23 and 24 as test set, and the remaining sections as training set. The model that performs best on the development set is evaluated on the test set for the final result. Since the standard BiLSTM-CRF model cannot handle nested entities, we use either the outermost (BiLSTM-CRF-TOP in Table 2) or the innermost mentions (BiLSTM-CRF-BOTTOM) for training. We also combine the outputs from these two flat NER models, and denote the result as BiLSTM-CRF-BOTH.

From Table 2, we can see that single flat NER models can achieve high precision but suffer from low recall. For example, the model pretrained on outermost (top) mentions has 38.0 recall, as around 60% of mentions are nested within others. The hypergraph-based model performs best on our dataset, presumably because it can capture mentions from different levels and does not suffer from issues of structural ambiguity during inference (Muis and Lu, 2017; Wang and Lu, 2018). However, its decoding speed of 9 words per second is slow due to the large number of entity categories of our dataset.222The decoding time complexity of the method proposed by Wang and Lu (2018) is $O(cmn)$ , where $m$ is the number of entity types, $n$ is the sentence length, and $c$ is the maximal mention length. The transition-based method has a higher decode speed of 57 words per second, but has much lower precision than flat NER models.

4 Related Work

Other corpora with nested entities:

We briefly compare existing annotated English corpora involving nested entities. A comparison of statistics between our dataset and two widely used benchmark datasets is shown in Table 3. The ACE corpora (Mitchell et al., 2004; Walker et al., 2005) consist of data of various types annotated for entities, relations and events. The entity component of ACE is framed in terms of nominal modification, and nested mentions are only annotated in nominal mentions, not inside other named entity mentions. For example, in ACE2005, ‘Secretary of Homeland Security Tom Ridge’ is annotated as a person, containing two other person annotations: ‘Secretary’ and ‘Secretary of Homeland Security’. In contrast, our annotations capture more interactions between different semantic spans: person consisting of and name, and containing government.

The GENIA corpus (Kim et al., 2003) is a richly-annotated corpus for bio-text mining that has 36 entity types among 2,000 MEDLINE abstracts. Due to the biomedical domain’s specialized terminology and complex naming conventions, entities of interest, such as genes, proteins or disease names, often nest. For example, the RNA ‘CIITA mRNA’ contains a DNA mention ‘CIITA’.

In addition to these two commonly used nested entity corpora, Byrne (2007) and Alex et al. (2007) introduced datasets with nested entities in historical archive and biomedical domains, respectively. However, their datasets are not publicly available. Four percent of entity mentions annotated in the English entity discovery and linking task in TAC-KBP track include nesting (Ji et al., 2014).

Resources built on the PTB:

A lots of effort has been made on adding syntactic and semantic information to the PTB (Marcus et al., 1993). PropBank (Kingsbury et al., 2002) extended the PTB with the predicate argument relationships between verbs and their arguments. NomBank (Meyers et al., 2004) extended the argument structure for instances of common nouns. Vadas and Curran (2007), and Ficler and Goldberg (2016) extended the PTB with noun phrase and coordination annotations, respectively.

Our dataset is built on top of the PTB and enriches the full ecosystem of resources and systems that stem from it.

5 Summary

We present NNE, a large-scale, nested, fine-grained named entity dataset. We are optimistic that NNE will encourage the development of new NER models that recognize structural information within entities, and therefore understand fine-grained semantic information captured. Additionally, our annotations are built on top of the PTB, so that the NNE dataset will allow joint learning models to take advantage of semantic and syntactic annotations, and ultimately to understand and exploit the true structure of named entities.

Acknowledgments

We would like to thank annotators for their excellent work: Kellie Webster, Vivian Li, Joanne Yang and Kristy Hughes. We also thank three anonymous reviewers for their insightful comments.

Appendix A Full annotation scheme

Bibliography29

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Alex et al. (2007) Beatrice Alex, Barry Haddow, and Claire Grover. 2007. Recognising nested named entities in biomedical text . In Bio NLP , pages 65–72.
2Byrne (2007) Kate Byrne. 2007. Nested named entity recognition in historical archive text . In ICSC , pages 589–596.
3Carletta (1996) Jean Carletta. 1996. Assessing agreement on classification tasks: The Kappa statistic. Comput. Linguist. , 22(2):249–254.
4Dai (2018) Xiang Dai. 2018. Recognizing complex entity mentions: A review and future directions . In ACL-SRW , pages 37–44.
5Ficler and Goldberg (2016) Jessica Ficler and Yoav Goldberg. 2016. Coordination annotation extension in the Penn tree bank . In ACL , pages 834–842. · doi ↗
6Finkel and Manning (2009) Jenny Rose Finkel and Christopher Manning. 2009. Nested named entity recognition . In EMNLP , pages 141–150.
7Ji et al. (2014) Heng Ji, Joel Nothman, and Ben Hachey. 2014. Overview of TAC-KBP 2014 entity discovery and linking tasks. In TAC , pages 1333–1339.
8Ju et al. (2018) Meizhi Ju, Makoto Miwa, and Sophia Ananiadou. 2018. A neural layered model for nested named entity recognition . In NAACL , pages 1446–1459.