Text Classification through Glyph-aware Disentangled Character Embedding and Semantic Sub-character Augmentation
Takumi Aoki, Shunsuke Kitada, Hitoshi Iyatomi

TL;DR
This paper introduces a novel glyph-aware disentangled character embedding and semantic sub-character augmentation framework for non-alphabetic language text classification, improving interpretability and performance.
Contribution
It presents a new variational character encoder with glyph-aware embeddings and a semantic augmentation method, enhancing interpretability and classification accuracy.
Findings
GDCE provides interpretable, dimensionally independent embeddings
SSA improves classification performance
Framework achieves competitive results with state-of-the-art models
Abstract
We propose a new character-based text classification framework for non-alphabetic languages, such as Chinese and Japanese. Our framework consists of a variational character encoder (VCE) and character-level text classifier. The VCE is composed of a -variational auto-encoder (-VAE) that learns the proposed glyph-aware disentangled character embedding (GDCE). Since our GDCE provides zero-mean unit-variance character embeddings that are dimensionally independent, it is applicable for our interpretable data augmentation, namely, semantic sub-character augmentation (SSA). In this paper, we evaluated our framework using Japanese text classification tasks at the document- and sentence-level. We confirmed that our GDCE and SSA not only provided embedding interpretability but also improved the classification performance. Our proposal achieved a competitive result to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text and Document Classification Technologies
MethodsInterpretability
