Moto: Enhancing Embedding with Multiple Joint Factors for Chinese Text Classification
Xunzhu Tang, Rujie Zhu, Tiezhu Sun, Shi Wang

TL;DR
This paper introduces Moto, a novel Chinese text classification model that effectively fuses multiple linguistic factors like radicals, Pinyin, and Wubi using an attention mechanism, achieving state-of-the-art results.
Contribution
The paper proposes a new multi-factor embedding model for Chinese text classification that integrates various linguistic features with an attention mechanism, outperforming existing methods.
Findings
Achieved state-of-the-art F1-score of 0.8316 on Chinese news titles.
Improved classification accuracy on Fudan Corpus by 1.24%.
Enhanced performance on THUCNews with a 3.26% increase.
Abstract
Recently, language representation techniques have achieved great performances in text classification. However, most existing representation models are specifically designed for English materials, which may fail in Chinese because of the huge difference between these two languages. Actually, few existing methods for Chinese text classification process texts at a single level. However, as a special kind of hieroglyphics, radicals of Chinese characters are good semantic carriers. In addition, Pinyin codes carry the semantic of tones, and Wubi reflects the stroke structure information, \textit{etc}. Unfortunately, previous researches neglected to find an effective way to distill the useful parts of these four factors and to fuse them. In our works, we propose a novel model called Moto: Enhancing Embedding with \textbf{M}ultiple J\textbf{o}int Fac\textbf{to}rs. Specifically, we design an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text and Document Classification Technologies · Natural Language Processing Techniques
Methodsfail
