MBT: A Memory-Based Part of Speech Tagger-Generator

Walter Daelemans (U. Tilburg; U. Antwerp); Jakub Zavrel (U. Tilburg); Peter Berck (U. Antwerp); Steven Gillis (U. Antwerp)

arXiv:cmp-lg/9607012·cmp-lg·February 3, 2008·261 cites

MBT: A Memory-Based Part of Speech Tagger-Generator

Walter Daelemans (U. Tilburg, U. Antwerp), Jakub Zavrel (U. Tilburg), Peter Berck (U. Antwerp), Steven Gillis (U. Antwerp)

PDF

Open Access 1 Repo

TL;DR

This paper presents a memory-based method for part of speech tagging that automatically builds a tagger from a tagged corpus, offering comparable accuracy to statistical methods with advantages like small training data and fast processing.

Contribution

It introduces a memory-based approach using IGTree for efficient, scalable, and accurate part of speech tagging with dynamic context size determination.

Findings

01

Achieves tagging accuracy comparable to statistical methods.

02

Demonstrates feasibility of large-scale memory-based tagging.

03

Offers advantages like small training corpus and fast learning.

Abstract

We introduce a memory-based approach to part of speech tagging. Memory-based learning is a form of supervised learning based on similarity-based reasoning. The part of speech tag of a word in a particular context is extrapolated from the most similar cases held in memory. Supervised learning approaches are useful when a tagged corpus is available as an example of the desired output of the tagger. Based on such a corpus, the tagger-generator automatically builds a tagger which is able to tag new text the same way, diminishing development time for the construction of a tagger considerably. Memory-based tagging shares this advantage with other statistical or machine learning approaches. Additional advantages specific to a memory-based approach include (i) the relatively small tagged corpus size sufficient for training, (ii) incremental learning, (iii) explanation capabilities, (iv)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mikekestemont/anthem
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems