Named Entity Extraction with Finite State Transducers

Diego Alexander Hu\'erfano Villalba; Elizabeth Le\'on Guzm\'an

arXiv:2006.11548·cs.CL·June 23, 2020

Named Entity Extraction with Finite State Transducers

Diego Alexander Hu\'erfano Villalba, Elizabeth Le\'on Guzm\'an

PDF

Open Access 2 Repos

TL;DR

This paper presents a language-agnostic, automaton-based named entity tagging system that is simple, fast, and effective, achieving competitive results with minimal linguistic resources.

Contribution

It introduces a novel automaton-based approach for named entity recognition that requires minimal linguistic knowledge and can be easily adapted to multiple languages.

Findings

01

Achieved 60% F1 score on Spanish CoNLL-2002 dataset.

02

Developed a linear-time tagging system based on finite state transducers.

03

Presented an algorithm for constructing the final transducer encoding contextual rules.

Abstract

We describe a named entity tagging system that requires minimal linguistic knowledge and can be applied to more target languages without substantial changes. The system is based on the ideas of the Brill's tagger which makes it really simple. Using supervised machine learning, we construct a series of automatons (or transducers) in order to tag a given text. The final model is composed entirely of automatons and it requires a lineal time for tagging. It was tested with the Spanish data set provided in the CoNLL- $2002$ attaining an overall $F_{β = 1}$ measure of $60%.$ Also, we present an algorithm for the construction of the final transducer used to encode all the learned contextual rules.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Algorithms and Data Compression