SASICM A Multi-Task Benchmark For Subtext Recognition
Hua Yan, Feng Han, Junyi An, Weikang Xiao, Jian Zhao, Furao Shen

TL;DR
This paper introduces SASICM, a multi-task benchmark dataset and baseline model for subtext recognition in Chinese social media, demonstrating improved performance over existing methods and models.
Contribution
It provides a new Chinese subtext dataset from social media and proposes the SASICM baseline model, achieving higher F1 scores than traditional and state-of-the-art methods.
Findings
SASICMg with GloVe achieves 64.37% F1 score, outperforming BERT-based models.
SASICMBERT with BERT achieves 65.12% F1 score, slightly better than SASICMg.
The models achieve over 70% accuracy, competitive with existing approaches.
Abstract
Subtext is a kind of deep semantics which can be acquired after one or more rounds of expression transformation. As a popular way of expressing one's intentions, it is well worth studying. In this paper, we try to make computers understand whether there is a subtext by means of machine learning. We build a Chinese dataset whose source data comes from the popular social media (e.g. Weibo, Netease Music, Zhihu, and Bilibili). In addition, we also build a baseline model called SASICM to deal with subtext recognition. The F1 score of SASICMg, whose pretrained model is GloVe, is as high as 64.37%, which is 3.97% higher than that of BERT based model, 12.7% higher than that of traditional methods on average, including support vector machine, logistic regression classifier, maximum entropy classifier, naive bayes classifier and decision tree and 2.39% higher than that of the state-of-the-art,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Music and Audio Processing
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Adam · Refunds@Expedia|||How do I get a full refund from Expedia? · GloVe Embeddings · Linear Warmup With Linear Decay · Residual Connection · WordPiece · Attention Dropout
