MBT: A Memory-Based Part of Speech Tagger-Generator
Walter Daelemans (U. Tilburg, U. Antwerp), Jakub Zavrel (U. Tilburg), Peter Berck (U. Antwerp), Steven Gillis (U. Antwerp)

TL;DR
This paper presents a memory-based method for part of speech tagging that automatically builds a tagger from a tagged corpus, offering comparable accuracy to statistical methods with advantages like small training data and fast processing.
Contribution
It introduces a memory-based approach using IGTree for efficient, scalable, and accurate part of speech tagging with dynamic context size determination.
Findings
Achieves tagging accuracy comparable to statistical methods.
Demonstrates feasibility of large-scale memory-based tagging.
Offers advantages like small training corpus and fast learning.
Abstract
We introduce a memory-based approach to part of speech tagging. Memory-based learning is a form of supervised learning based on similarity-based reasoning. The part of speech tag of a word in a particular context is extrapolated from the most similar cases held in memory. Supervised learning approaches are useful when a tagged corpus is available as an example of the desired output of the tagger. Based on such a corpus, the tagger-generator automatically builds a tagger which is able to tag new text the same way, diminishing development time for the construction of a tagger considerably. Memory-based tagging shares this advantage with other statistical or machine learning approaches. Additional advantages specific to a memory-based approach include (i) the relatively small tagged corpus size sufficient for training, (ii) incremental learning, (iii) explanation capabilities, (iv)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
