Blow the Dog Whistle: A Chinese Dataset for Cant Understanding with Common Sense and World Knowledge
Canwen Xu, Wangchunshu Zhou, Tao Ge, Ke Xu, Julian McAuley, and Furu Wei

TL;DR
This paper introduces a large Chinese dataset for cant understanding, highlighting its complexity and usefulness as a benchmark for evaluating deep language understanding, common sense, and world knowledge in pretrained models.
Contribution
The paper provides the first large Chinese cant dataset, formulates a cant understanding task, and analyzes model performance, advancing computational understanding of cant and related language phenomena.
Findings
Pretrained models struggle with cant understanding, indicating its complexity.
Cant understanding requires deep language, common sense, and world knowledge.
The dataset serves as a new benchmark for evaluating language models.
Abstract
Cant is important for understanding advertising, comedies and dog-whistle politics. However, computational research on cant is hindered by a lack of available datasets. In this paper, we propose a large and diverse Chinese dataset for creating and understanding cant from a computational linguistics perspective. We formulate a task for cant understanding and provide both quantitative and qualitative analysis for tested word embedding similarity and pretrained language models. Experiments suggest that such a task requires deep language understanding, common sense, and world knowledge and thus can be a good testbed for pretrained language models and help models perform better on other tasks. The code is available at https://github.com/JetRunner/dogwhistle. The data and leaderboard are available at https://competitions.codalab.org/competitions/30451.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHumor Studies and Applications · Authorship Attribution and Profiling · Hate Speech and Cyberbullying Detection
