Spelling Correction in Healthcare Query-Answer Systems: Methods, Retrieval Impact, and Empirical Evaluation
Saurabh K Singh

TL;DR
This study evaluates the impact of spelling correction methods on healthcare question-answering systems, demonstrating that query-side correction significantly enhances retrieval effectiveness in real-world medical queries.
Contribution
It provides the first controlled empirical analysis of spelling correction as a preprocessing step in healthcare QA using real consumer queries, highlighting the importance of query-side correction.
Findings
Query correction improves retrieval metrics significantly.
Correcting only the query yields more benefit than correcting the corpus.
61.5% of real medical queries contain at least one spelling error.
Abstract
Healthcare question-answering (QA) systems face a persistent challenge: users submit queries with spelling errors at rates substantially higher than those found in the professional documents they search. This paper presents the first controlled study of spelling correction as a retrieval preprocessing step in healthcare QA using real consumer queries. We conduct an error census across two public datasets -- the TREC 2017 LiveQA Medical track (104 consumer health questions) and HealthSearchQA (4,436 health queries from Google autocomplete) -- finding that 61.5% of real medical queries contain at least one spelling error, with a token-level error rate of 11.0%. We evaluate four correction methods -- conservative edit distance, standard edit distance (Levenshtein), context-aware candidate ranking, and SymSpell -- across three experimental conditions: uncorrected queries against an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Topic Modeling · Health Literacy and Information Accessibility
