Writing Style Matters: An Examination of Bias and Fairness in Information Retrieval Systems
Hongliu Cao

TL;DR
This paper investigates how biases related to writing styles in text embedding models affect fairness in information retrieval systems, revealing style preferences that can marginalize certain communication forms.
Contribution
It uncovers style-based biases in state-of-the-art embedding models and analyzes their impact on fairness and retrieval performance in IR systems.
Findings
Embedding models prefer certain writing styles over others.
Informal and emotive styles are less favored by most models.
Biases influence retrieval and answer styles, affecting fairness.
Abstract
The rapid advancement of Language Model technologies has opened new opportunities, but also introduced new challenges related to bias and fairness. This paper explores the uncharted territory of potential biases in state-of-the-art universal text embedding models towards specific document and query writing styles within Information Retrieval (IR) systems. Our investigation reveals that different embedding models exhibit different preferences of document writing style, while more informal and emotive styles are less favored by most embedding models. In terms of query writing styles, many embedding models tend to match the style of the query with the style of the retrieved documents, but some show a consistent preference for specific styles. Text embedding models fine-tuned on synthetic data generated by LLMs display a consistent preference for certain style of generated data. These…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
