Error-Robust Retrieval for Chinese Spelling Check

Xunjian Yin; Xinyu Hu; Jin Jiang; Xiaojun Wan

arXiv:2211.07843·cs.CL·February 27, 2024

Error-Robust Retrieval for Chinese Spelling Check

Xunjian Yin, Xinyu Hu, Jin Jiang, Xiaojun Wan

PDF

Open Access 1 Repo

TL;DR

This paper introduces RERIC, a retrieval-based method that enhances Chinese Spelling Check models by leveraging multimodal representations and error-robust information, significantly improving performance on benchmark datasets.

Contribution

The paper proposes a plug-and-play retrieval approach with multimodal features and reranking for Chinese Spelling Check, addressing data limitations and improving robustness.

Findings

01

Achieves substantial improvements on SIGHAN benchmarks.

02

Effectively leverages training data with multimodal representations.

03

Enhances error robustness in Chinese Spelling Check models.

Abstract

Chinese Spelling Check (CSC) aims to detect and correct error tokens in Chinese contexts, which has a wide range of applications. However, it is confronted with the challenges of insufficient annotated data and the issue that previous methods may actually not fully leverage the existing datasets. In this paper, we introduce our plug-and-play retrieval method with error-robust information for Chinese Spelling Check (RERIC), which can be directly applied to existing CSC models. The datastore for retrieval is built completely based on the training data, with elaborate designs according to the characteristics of CSC. Specifically, we employ multimodal representations that fuse phonetic, morphologic, and contextual information in the calculation of query and key during retrieval to enhance robustness against potential errors. Furthermore, in order to better judge the retrieved candidates,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

arvid-pku/reric
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Handwritten Text Recognition Techniques