Learning Chinese Word Representations From Glyphs Of Characters

Tzu-Ray Su; Hung-Yi Lee

arXiv:1708.04755·cs.CL·August 17, 2017·21 cites

Learning Chinese Word Representations From Glyphs Of Characters

Tzu-Ray Su, Hung-Yi Lee

PDF

Open Access 1 Repo

TL;DR

This paper introduces novel methods to enhance Chinese word representations by leveraging character glyphs learned from bitmaps using convolutional auto-encoders, and provides new evaluation datasets for traditional Chinese.

Contribution

It presents a new approach combining glyph features with character embeddings to improve Chinese word representations and releases publicly available evaluation datasets.

Findings

01

Glyph features improve word representations

02

Character glyphs learned from bitmaps enhance semantic understanding

03

New evaluation datasets for traditional Chinese are provided

Abstract

In this paper, we propose new methods to learn Chinese word representations. Chinese characters are composed of graphical components, which carry rich semantics. It is common for a Chinese learner to comprehend the meaning of a word from these graphical components. As a result, we propose models that enhance word representations by character glyphs. The character glyph features are directly learned from the bitmaps of characters by convolutional auto-encoder(convAE), and the glyph features improve Chinese word representations which are already enhanced by character embeddings. Another contribution in this paper is that we created several evaluation datasets in traditional Chinese and made them public.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ray1007/GWE
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Handwritten Text Recognition Techniques