Improving the quality of Persian clinical text with a novel spelling   correction system

Seyed Mohammad Sadegh Dashti; Seyedeh Fatemeh Dashti

arXiv:2408.03622·cs.CL·August 8, 2024

Improving the quality of Persian clinical text with a novel spelling correction system

Seyed Mohammad Sadegh Dashti, Seyedeh Fatemeh Dashti

PDF

TL;DR

This paper presents a novel spelling correction system for Persian clinical text that combines a fine-tuned pre-trained model with an innovative orthographic similarity algorithm, achieving high accuracy in error detection and correction.

Contribution

The study introduces a new approach integrating a pre-trained model and the PERTO algorithm for improved Persian clinical text spelling correction, addressing language-specific challenges.

Findings

01

F1-Score of 90.0% for non-word error correction with PERTO

02

F1-Score of 90.6% for real-word error detection

03

F1-Score of 91.5% for real-word correction with PERTO

Abstract

Background: The accuracy of spelling in Electronic Health Records (EHRs) is a critical factor for efficient clinical care, research, and ensuring patient safety. The Persian language, with its abundant vocabulary and complex characteristics, poses unique challenges for real-word error correction. This research aimed to develop an innovative approach for detecting and correcting spelling errors in Persian clinical text. Methods: Our strategy employs a state-of-the-art pre-trained model that has been meticulously fine-tuned specifically for the task of spelling correction in the Persian clinical domain. This model is complemented by an innovative orthographic similarity matching algorithm, PERTO, which uses visual similarity of characters for ranking correction candidates. Results: The evaluation of our approach demonstrated its robustness and precision in detecting and rectifying…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.