CoLa: Chinese Character Decomposition with Compositional Latent Components

Fan Shi; Haiyang Yu; Bin Li; Xiangyang Xue

arXiv:2506.03798·cs.CV·June 5, 2025

CoLa: Chinese Character Decomposition with Compositional Latent Components

Fan Shi, Haiyang Yu, Bin Li, Xiangyang Xue

PDF

Open Access

TL;DR

CoLa is a novel deep latent variable model that learns to decompose Chinese characters into compositional components without predefined schemes, enabling effective zero-shot recognition and cross-dataset generalization.

Contribution

It introduces a learning-to-learn approach for Chinese character decomposition, surpassing prior methods that relied on human-defined schemes, thus improving zero-shot recognition capabilities.

Findings

01

Outperforms previous methods in zero-shot CCR.

02

Learned components reflect character structure interpretably.

03

Generalizes to historical oracle bone characters.

Abstract

Humans can decompose Chinese characters into compositional components and recombine them to recognize unseen characters. This reflects two cognitive principles: Compositionality, the idea that complex concepts are built on simpler parts; and Learning-to-learn, the ability to learn strategies for decomposing and recombining components to form new concepts. These principles provide inductive biases that support efficient generalization. They are critical to Chinese character recognition (CCR) in solving the zero-shot problem, which results from the common long-tail distribution of Chinese character datasets. Existing methods have made substantial progress in modeling compositionality via predefined radical or stroke decomposition. However, they often ignore the learning-to-learn capability, limiting their ability to generalize beyond human-defined schemes. Inspired by these principles, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification