Glyph-aware Embedding of Chinese Characters

Falcon Z. Dai; Zheng Cai

arXiv:1709.00028·cs.CL·September 11, 2018·5 cites

Glyph-aware Embedding of Chinese Characters

Falcon Z. Dai, Zheng Cai

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel glyph-aware embedding method for Chinese characters that leverages CNNs to incorporate visual glyph information, improving NLP task performance by capturing semantic and syntactic cues.

Contribution

It proposes a new character embedding approach that explicitly models Chinese glyphs using CNNs, integrating visual structure into NLP representations.

Findings

01

Improved performance in Chinese language modeling

02

Enhanced accuracy in word segmentation tasks

03

Effective encoding of semantic and syntactic information

Abstract

Given the advantage and recent success of English character-level and subword-unit models in several NLP tasks, we consider the equivalent modeling problem for Chinese. Chinese script is logographic and many Chinese logograms are composed of common substructures that provide semantic, phonetic and syntactic hints. In this work, we propose to explicitly incorporate the visual appearance of a character's glyph in its representation, resulting in a novel glyph-aware embedding of Chinese characters. Being inspired by the success of convolutional neural networks in computer vision, we use them to incorporate the spatio-structural patterns of Chinese glyphs as rendered in raw pixels. In the context of two basic Chinese NLP tasks of language modeling and word segmentation, the model learns to represent each character's task-relevant semantic and syntactic information in the character-level…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

falcondai/chinese-char-lm
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Handwritten Text Recognition Techniques · Multimodal Machine Learning Applications