Pairwise Comparison for Bias Identification and Quantification
Fabian Haak, Philipp Schaer

TL;DR
This paper explores the use of pairwise comparison techniques to efficiently identify and quantify linguistic bias in online media, aiming to create high-quality datasets and improve bias measurement methods.
Contribution
It introduces optimized pairwise comparison methods and evaluates their effectiveness through simulations and real data, providing a practical approach for bias quantification.
Findings
Pairwise comparison is effective for bias quantification.
Simulation results show robustness and cost-efficiency.
Real data application validates the approach's practicality.
Abstract
Linguistic bias in online news and social media is widespread but difficult to measure. Yet, its identification and quantification remain difficult due to subjectivity, context dependence, and the scarcity of high-quality gold-label datasets. We aim to reduce annotation effort by leveraging pairwise comparison for bias annotation. To overcome the costliness of the approach, we evaluate more efficient implementations of pairwise comparison-based rating. We achieve this by investigating the effects of various rating techniques and the parameters of three cost-aware alternatives in a simulation environment. Since the approach can in principle be applied to both human and large language model annotation, our work provides a basis for creating high-quality benchmark datasets and for quantifying biases and other subjective linguistic aspects. The controlled simulations include latent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuthorship Attribution and Profiling · Hate Speech and Cyberbullying Detection · Sentiment Analysis and Opinion Mining
