GLS-CSC: A Simple but Effective Strategy to Mitigate Chinese STM Models'   Over-Reliance on Superficial Clue

Yanrui Du; Sendong Zhao; Yuhan Chen; Rai Bai; Jing Liu; Hua Wu,; Haifeng Wang; Bing Qin

arXiv:2309.04162·cs.CL·September 11, 2023·1 cites

GLS-CSC: A Simple but Effective Strategy to Mitigate Chinese STM Models' Over-Reliance on Superficial Clue

Yanrui Du, Sendong Zhao, Yuhan Chen, Rai Bai, Jing Liu, Hua Wu,, Haifeng Wang, Bing Qin

PDF

Open Access

TL;DR

This paper introduces GLS-CSC, a resampling training strategy that reduces Chinese STM models' over-reliance on superficial clues like edit distance, thereby improving robustness and generalization across various test sets.

Contribution

The paper proposes a novel GLS-CSC strategy to mitigate superficial clue reliance in Chinese STM models, enhancing their robustness and generalization capabilities.

Findings

01

GLS-CSC outperforms existing methods in robustness tests.

02

The strategy improves model generalization across different domains.

03

Analysis reveals common issues in current superficial clue mitigation methods.

Abstract

Pre-trained models have achieved success in Chinese Short Text Matching (STM) tasks, but they often rely on superficial clues, leading to a lack of robust predictions. To address this issue, it is crucial to analyze and mitigate the influence of superficial clues on STM models. Our study aims to investigate their over-reliance on the edit distance feature, commonly used to measure the semantic similarity of Chinese text pairs, which can be considered a superficial clue. To mitigate STM models' over-reliance on superficial clues, we propose a novel resampling training strategy called Gradually Learn Samples Containing Superficial Clue (GLS-CSC). Through comprehensive evaluations of In-Domain (I.D.), Robustness (Rob.), and Out-Of-Domain (O.O.D.) test sets, we demonstrate that GLS-CSC outperforms existing methods in terms of enhancing the robustness and generalization of Chinese STM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques