Neural Modeling for Named Entities and Morphology (NEMO^2)

Dan Bareket; Reut Tsarfaty

arXiv:2007.15620·cs.CL·September 14, 2021

Neural Modeling for Named Entities and Morphology (NEMO^2)

Dan Bareket, Reut Tsarfaty

PDF

4 Repos 1 Datasets

TL;DR

This paper introduces a hybrid neural architecture for Named Entity Recognition in morphologically rich languages like Hebrew, demonstrating that modeling morphological boundaries improves performance over traditional pipeline methods.

Contribution

The paper proposes a novel hybrid neural model that jointly addresses morphological boundaries and NER, outperforming standard sequential approaches in Hebrew.

Findings

01

Model improves NER accuracy in Hebrew.

02

Hybrid approach outperforms pipeline methods.

03

Sets new benchmarks for Hebrew NER and morphology.

Abstract

Named Entity Recognition (NER) is a fundamental NLP task, commonly formulated as classification over a sequence of tokens. Morphologically-Rich Languages (MRLs) pose a challenge to this basic formulation, as the boundaries of Named Entities do not necessarily coincide with token boundaries, rather, they respect morphological boundaries. To address NER in MRLs we then need to answer two fundamental questions, namely, what are the basic units to be labeled, and how can these units be detected and classified in realistic settings, i.e., where no gold morphology is available. We empirically investigate these questions on a novel NER benchmark, with parallel tokenlevel and morpheme-level NER annotations, which we develop for Modern Hebrew, a morphologically rich-and-ambiguous language. Our results show that explicitly modeling morphological boundaries leads to improved NER performance, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

imvladikon/bmc
dataset· 35 dl
35 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.