Sparse Coding of Neural Word Embeddings for Multilingual Sequence   Labeling

G\'abor Berend

arXiv:1612.07130·cs.CL·December 22, 2016

Sparse Coding of Neural Word Embeddings for Multilingual Sequence Labeling

G\'abor Berend

PDF

Open Access

TL;DR

This paper introduces a sequence labeling framework that uses sparse indicator features from dense word embeddings, achieving near state-of-the-art results in multilingual POS tagging and NER with minimal training data.

Contribution

It presents a novel approach that leverages sparse coding of neural embeddings for effective multilingual sequence labeling without modifying the original embeddings.

Findings

01

Achieves near state-of-the-art performance in POS tagging and NER across multiple languages.

02

Maintains over 89.8% of its accuracy with only 1.2% of training data.

03

Uses only a few thousand sparse features without altering word representations.

Abstract

In this paper we propose and carefully evaluate a sequence labeling framework which solely utilizes sparse indicator features derived from dense distributed word representations. The proposed model obtains (near) state-of-the art performance for both part-of-speech tagging and named entity recognition for a variety of languages. Our model relies only on a few thousand sparse coding-derived features, without applying any modification of the word representations employed for the different tasks. The proposed model has favorable generalization properties as it retains over 89.8% of its average POS tagging accuracy when trained at 1.2% of the total available training data, i.e.~150 sentences per language.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis