Creating a tagset, lexicon and guesser for a French tagger
Jean-Pierre Chanod, Pasi Tapanainen (Rank Xerox Research Centre,, Grenoble Laboratory)

TL;DR
This paper details the development of a French tagset, lexicon, and guesser, enhancing the capabilities of existing statistical and constraint-based taggers by defining new resources and methods for handling unknown words.
Contribution
It introduces a new tagset, constructs a lexicon from a morphological analyser, and develops a lexical transducer for guessing unknown words in French tagging.
Findings
Enhanced French tagger resources
Effective lexical guessing for unknown words
Integration of morphological analysis with tagging
Abstract
We earlier described two taggers for French, a statistical one and a constraint-based one. The two taggers have the same tokeniser and morphological analyser. In this paper, we describe aspects of this work concerned with the definition of the tagset, the building of the lexicon, derived from an existing two-level morphological analyser, and the definition of a lexical transducer for guessing unknown words.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Lexicography and Language Studies · Mathematics, Computing, and Information Processing
