Text Classification through Glyph-aware Disentangled Character Embedding   and Semantic Sub-character Augmentation

Takumi Aoki; Shunsuke Kitada; Hitoshi Iyatomi

arXiv:2011.04184·cs.CL·November 10, 2020

Text Classification through Glyph-aware Disentangled Character Embedding and Semantic Sub-character Augmentation

Takumi Aoki, Shunsuke Kitada, Hitoshi Iyatomi

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel glyph-aware disentangled character embedding and semantic sub-character augmentation framework for non-alphabetic language text classification, improving interpretability and performance.

Contribution

It presents a new variational character encoder with glyph-aware embeddings and a semantic augmentation method, enhancing interpretability and classification accuracy.

Findings

01

GDCE provides interpretable, dimensionally independent embeddings

02

SSA improves classification performance

03

Framework achieves competitive results with state-of-the-art models

Abstract

We propose a new character-based text classification framework for non-alphabetic languages, such as Chinese and Japanese. Our framework consists of a variational character encoder (VCE) and character-level text classifier. The VCE is composed of a $β$ -variational auto-encoder ( $β$ -VAE) that learns the proposed glyph-aware disentangled character embedding (GDCE). Since our GDCE provides zero-mean unit-variance character embeddings that are dimensionally independent, it is applicable for our interpretable data augmentation, namely, semantic sub-character augmentation (SSA). In this paper, we evaluated our framework using Japanese text classification tasks at the document- and sentence-level. We confirmed that our GDCE and SSA not only provided embedding interpretability but also improved the classification performance. Our proposal achieved a competitive result to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

IyatomiLab/GDCE-SSA
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text and Document Classification Technologies

MethodsInterpretability