Logographic Information Aids Learning Better Representations for Natural Language Inference
Zijian Jin, Duygu Ataman

TL;DR
This paper investigates how incorporating logographic visual features into language models enhances semantic understanding in natural language inference, especially for languages with logographic scripts like Chinese and Vietnamese.
Contribution
It introduces a multi-modal approach combining contextual and glyph information, demonstrating improved representations for logographic languages in NLI tasks.
Findings
Multi-modal embeddings outperform traditional models in logographic languages.
Glyph information significantly benefits low-frequency words.
Results are consistent across six diverse languages.
Abstract
Statistical language models conventionally implement representation learning based on the contextual distribution of words or other formal units, whereas any information related to the logographic features of written text are often ignored, assuming they should be retrieved relying on the cooccurence statistics. On the other hand, as language models become larger and require more data to learn reliable representations, such assumptions may start to fall back, especially under conditions of data sparsity. Many languages, including Chinese and Vietnamese, use logographic writing systems where surface forms are represented as a visual organization of smaller graphemic units, which often contain many semantic cues. In this paper, we present a novel study which explores the benefits of providing language models with logographic information in learning better semantic representations. We test…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
MethodsTest
