Improving the quality of Persian clinical text with a novel spelling correction system
Seyed Mohammad Sadegh Dashti, Seyedeh Fatemeh Dashti

TL;DR
This paper presents a novel spelling correction system for Persian clinical text that combines a fine-tuned pre-trained model with an innovative orthographic similarity algorithm, achieving high accuracy in error detection and correction.
Contribution
The study introduces a new approach integrating a pre-trained model and the PERTO algorithm for improved Persian clinical text spelling correction, addressing language-specific challenges.
Findings
F1-Score of 90.0% for non-word error correction with PERTO
F1-Score of 90.6% for real-word error detection
F1-Score of 91.5% for real-word correction with PERTO
Abstract
Background: The accuracy of spelling in Electronic Health Records (EHRs) is a critical factor for efficient clinical care, research, and ensuring patient safety. The Persian language, with its abundant vocabulary and complex characteristics, poses unique challenges for real-word error correction. This research aimed to develop an innovative approach for detecting and correcting spelling errors in Persian clinical text. Methods: Our strategy employs a state-of-the-art pre-trained model that has been meticulously fine-tuned specifically for the task of spelling correction in the Persian clinical domain. This model is complemented by an innovative orthographic similarity matching algorithm, PERTO, which uses visual similarity of characters for ranking correction candidates. Results: The evaluation of our approach demonstrated its robustness and precision in detecting and rectifying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
