GWPT: A Green Word-Embedding-based POS Tagger

Chengwei Wei; Runqi Pang; C.-C. Jay Kuo

arXiv:2401.07475·cs.CL·January 17, 2024·1 cites

GWPT: A Green Word-Embedding-based POS Tagger

Chengwei Wei, Runqi Pang, C.-C. Jay Kuo

PDF

Open Access

TL;DR

GWPT is a lightweight, efficient POS tagging method utilizing novel embedding techniques that achieves high accuracy with fewer resources, suitable for resource-constrained NLP applications.

Contribution

This work introduces GWPT, a green learning-based POS tagger that combines innovative embedding partitioning with N-gram representations, reducing complexity while maintaining accuracy.

Findings

01

Achieves state-of-the-art accuracy among lightweight POS taggers.

02

Uses significantly fewer parameters and lower computational complexity.

03

Effective with both non-contextual and contextual embeddings.

Abstract

As a fundamental tool for natural language processing (NLP), the part-of-speech (POS) tagger assigns the POS label to each word in a sentence. A novel lightweight POS tagger based on word embeddings is proposed and named GWPT (green word-embedding-based POS tagger) in this work. Following the green learning (GL) methodology, GWPT contains three modules in cascade: 1) representation learning, 2) feature learning, and 3) decision learning modules. The main novelty of GWPT lies in representation learning. It uses non-contextual or contextual word embeddings, partitions embedding dimension indices into low-, medium-, and high-frequency sets, and represents them with different N-grams. It is shown by experimental results that GWPT offers state-of-the-art accuracies with fewer model parameters and significantly lower computational complexity in both training and inference as compared with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text and Document Classification Technologies