On the Effects of Regional Spelling Conventions in Retrieval Models

Andreas Chari; Sean MacAvaney; Iadh Ounis

arXiv:2308.00480·cs.IR·August 2, 2023

On the Effects of Regional Spelling Conventions in Retrieval Models

Andreas Chari, Sean MacAvaney, Iadh Ounis

PDF

1 Repo

TL;DR

This paper investigates how regional spelling differences, like color versus colour, affect neural retrieval models, revealing that models generally generalize well despite spelling biases, but normalization impacts performance variably.

Contribution

It provides a systematic analysis of the impact of regional spelling conventions on neural retrieval models and the effects of spelling normalization on their performance.

Findings

01

American spelling conventions are more prevalent in datasets.

02

Models generally generalize well despite spelling biases.

03

Normalization affects models differently, with lexical models improving and dense retrievers unaffected.

Abstract

One advantage of neural ranking models is that they are meant to generalise well in situations of synonymity i.e. where two words have similar or identical meanings. In this paper, we investigate and quantify how well various ranking models perform in a clear-cut case of synonymity: when words are simply expressed in different surface forms due to regional differences in spelling conventions (e.g., color vs colour). We first explore the prevalence of American and British English spelling conventions in datasets used for the pre-training, training and evaluation of neural retrieval methods, and find that American spelling conventions are far more prevalent. Despite these biases in the training data, we find that retrieval models often generalise well in this case of synonymity. We explore the effect of document spelling normalisation in retrieval and observe that all models are affected…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

andreaschari/regional_spelling
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Methods7 Fastest Ways to Call American Airlines Reservations Number (USA Guide)