Towards No.1 in CLUE Semantic Matching Challenge: Pre-trained Language Model Erlangshen with Propensity-Corrected Loss
Junjie Wang, Yuxiang Zhang, Ping Yang, Ruyi Gan

TL;DR
This paper introduces Erlangshen, a pre-trained language model with a novel propensity-corrected loss, achieving top results in the CLUE Semantic Matching Challenge through innovative pre-training and fine-tuning strategies.
Contribution
The paper presents Erlangshen, a new pre-trained language model that incorporates propensity-corrected loss and a dynamic masking strategy, setting a new benchmark in semantic matching.
Findings
Achieved 72.54 F1 Score on CLUE test set.
Achieved 78.90 Accuracy on CLUE test set.
Outperformed previous models in semantic matching challenge.
Abstract
This report describes a pre-trained language model Erlangshen with propensity-corrected loss, the No.1 in CLUE Semantic Matching Challenge. In the pre-training stage, we construct a dynamic masking strategy based on knowledge in Masked Language Modeling (MLM) with whole word masking. Furthermore, by observing the specific structure of the dataset, the pre-trained Erlangshen applies propensity-corrected loss (PCL) in the fine-tuning phase. Overall, we achieve 72.54 points in F1 Score and 78.90 points in Accuracy on the test set. Our code is publicly available at: https://github.com/IDEA-CCNL/Fengshenbang-LM/tree/hf-ds/fengshen/examples/clue_sim.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
MethodsTest
