An Efficient Inductive Unsupervised Semantic Tagger
K T Lua (Department of Information Systems, Computer Science, National University of Singapore, Singapore)

TL;DR
This paper introduces a fast and efficient inductive unsupervised semantic tagger for Chinese that achieves high accuracy and speed by combining semantic dictionaries and probabilistic modeling.
Contribution
The paper presents a novel inductive unsupervised approach for Chinese semantic tagging that is faster and simpler than traditional methods, with high accuracy.
Findings
Achieves 91% hit rate in semantic tagging.
Tags 142 words per second on standard hardware.
Runs 2.3 times faster than Viterbi tagger.
Abstract
We report our development of a simple but fast and efficient inductive unsupervised semantic tagger for Chinese words. A POS hand-tagged corpus of 348,000 words is used. The corpus is being tagged in two steps. First, possible semantic tags are selected from a semantic dictionary(Tong Yi Ci Ci Lin), the POS and the conditional probability of semantic from POS, i.e., P(S|P). The final semantic tag is then assigned by considering the semantic tags before and after the current word and the semantic-word conditional probability P(S|W) derived from the first step. Semantic bigram probabilities P(S|S) are used in the second step. Final manual checking shows that this simple but efficient algorithm has a hit rate of 91%. The tagger tags 142 words per second, using a 120 MHz Pentium running FOXPRO. It runs about 2.3 times faster than a Viterbi tagger.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Algorithms and Data Compression
