A BERT-based Dual Embedding Model for Chinese Idiom Prediction
Minghuan Tan, Jing Jiang

TL;DR
This paper introduces a BERT-based dual embedding model for Chinese idiom prediction, effectively matching idiom embeddings with context representations to improve accuracy on a Chinese idiom cloze test dataset.
Contribution
The paper presents a novel dual embedding approach combined with context pooling in BERT for Chinese idiom prediction, outperforming previous methods.
Findings
Outperforms existing state-of-the-art on Chinese idiom cloze test
Both context pooling and dual embeddings significantly improve performance
Ablation studies confirm the effectiveness of each component
Abstract
Chinese idioms are special fixed phrases usually derived from ancient stories, whose meanings are oftentimes highly idiomatic and non-compositional. The Chinese idiom prediction task is to select the correct idiom from a set of candidate idioms given a context with a blank. We propose a BERT-based dual embedding model to encode the contextual words as well as to learn dual embeddings of the idioms. Specifically, we first match the embedding of each candidate idiom with the hidden representation corresponding to the blank in the context. We then match the embedding of each candidate idiom with the hidden representations of all the tokens in the context thorough context pooling. We further propose to use two separate idiom embeddings for the two kinds of matching. Experiments on a recently released Chinese idiom cloze test dataset show that our proposed method performs better than the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Sentiment Analysis and Opinion Mining · Topic Modeling
