Tagset Design and Inflected Languages
David Elworthy

TL;DR
This paper investigates how the design of tagsets affects tagging accuracy across English, French, and Swedish, emphasizing the importance of linguistically motivated criteria and addressing challenges with inflected languages.
Contribution
It highlights the significance of external linguistic criteria in tagset design and discusses issues related to tagging unknown words in inflected languages.
Findings
External criteria improve tagging accuracy
Linguistic considerations are crucial in tagset design
Problems with tagging unknown words in inflected languages are identified
Abstract
An experiment designed to explore the relationship between tagging accuracy and the nature of the tagset is described, using corpora in English, French and Swedish. In particular, the question of internal versus external criteria for tagset design is considered, with the general conclusion that external (linguistic) criteria should be followed. Some problems associated with tagging unknown words in inflected languages are briefly considered.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Algorithms and Data Compression · Speech and dialogue systems
