Weak Semi-Markov CRFs for NP Chunking in Informal Text

Aldrian Obaja Muis; Wei Lu

arXiv:1810.08567·cs.CL·October 22, 2018

Weak Semi-Markov CRFs for NP Chunking in Informal Text

Aldrian Obaja Muis, Wei Lu

PDF

TL;DR

This paper presents a new annotated SMS corpus for noun phrase chunking and introduces a semi-Markov CRF variant that achieves comparable accuracy with faster processing.

Contribution

The paper develops a new annotated corpus for informal text and proposes a novel semi-CRF variant that improves efficiency for NP chunking.

Findings

01

The new corpus contains 76,490 noun phrases from SMS messages.

02

The semi-CRF variant runs significantly faster than traditional models.

03

Accuracy of the new model is comparable to existing semi-CRF approaches.

Abstract

This paper introduces a new annotated corpus based on an existing informal text corpus: the NUS SMS Corpus (Chen and Kan, 2013). The new corpus includes 76,490 noun phrases from 26,500 SMS messages, annotated by university students. We then explored several graphical models, including a novel variant of the semi-Markov conditional random fields (semi-CRF) for the task of noun phrase chunking. We demonstrated through empirical evaluations on the new dataset that the new variant yielded similar accuracy but ran in significantly lower running time compared to the conventional semi-CRF.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.