Synapse at CAp 2017 NER challenge: Fasttext CRF

Damien Sileo; Camille Pradel; Philippe Muller; Tim Van de Cruys

arXiv:1709.04820·cs.CL·September 15, 2017·2 cites

Synapse at CAp 2017 NER challenge: Fasttext CRF

Damien Sileo, Camille Pradel, Philippe Muller, Tim Van de Cruys

PDF

Open Access

TL;DR

This paper introduces a novel French tweet NER system using FastText embeddings and CRF, achieving top performance without external gazetteers, and pioneering the use of subword embeddings for NER.

Contribution

The first system to apply FastText embeddings and embedding-based sentence representations to French tweet NER, achieving state-of-the-art results without external resources.

Findings

01

Ranked first in CAp 2017 NER challenge with 58.89% F-measure

02

Utilizes unsupervised FastText embeddings with subword features

03

Achieves high accuracy without gazetteers or external data

Abstract

We present our system for the CAp 2017 NER challenge which is about named entity recognition on French tweets. Our system leverages unsupervised learning on a larger dataset of French tweets to learn features feeding a CRF model. It was ranked first without using any gazetteer or structured external data, with an F-measure of 58.89\%. To the best of our knowledge, it is the first system to use fasttext embeddings (which include subword representations) and an embedding-based sentence representation for NER.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies

MethodsfastText · Conditional Random Field