Learning Chinese Word Representations From Glyphs Of Characters
Tzu-Ray Su, Hung-Yi Lee

TL;DR
This paper introduces novel methods to enhance Chinese word representations by leveraging character glyphs learned from bitmaps using convolutional auto-encoders, and provides new evaluation datasets for traditional Chinese.
Contribution
It presents a new approach combining glyph features with character embeddings to improve Chinese word representations and releases publicly available evaluation datasets.
Findings
Glyph features improve word representations
Character glyphs learned from bitmaps enhance semantic understanding
New evaluation datasets for traditional Chinese are provided
Abstract
In this paper, we propose new methods to learn Chinese word representations. Chinese characters are composed of graphical components, which carry rich semantics. It is common for a Chinese learner to comprehend the meaning of a word from these graphical components. As a result, we propose models that enhance word representations by character glyphs. The character glyph features are directly learned from the bitmaps of characters by convolutional auto-encoder(convAE), and the glyph features improve Chinese word representations which are already enhanced by character embeddings. Another contribution in this paper is that we created several evaluation datasets in traditional Chinese and made them public.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Handwritten Text Recognition Techniques
