Technical Report: Impact of Position Bias on Language Models in Token Classification
Mehdi Ben Amor, Michael Granitzer, Jelena Mitrovi\'c

TL;DR
This paper examines how position bias affects language model performance in token classification tasks like NER and POS tagging, revealing performance drops and proposing mitigation methods that improve results.
Contribution
It introduces an evaluation approach for position bias in transformer models and proposes two training techniques to reduce its impact.
Findings
Position bias causes 3-9% performance drops in LMs.
Mitigation methods improve performance by approximately 2%.
Evaluation across multiple benchmarks confirms the bias effect.
Abstract
Language Models (LMs) have shown state-of-the-art performance in Natural Language Processing (NLP) tasks. Downstream tasks such as Named Entity Recognition (NER) or Part-of-Speech (POS) tagging are known to suffer from data imbalance issues, particularly regarding the ratio of positive to negative examples and class disparities. This paper investigates an often-overlooked issue of encoder models, specifically the position bias of positive examples in token classification tasks. For completeness, we also include decoders in the evaluation. We evaluate the impact of position bias using different position embedding techniques, focusing on BERT with Absolute Position Embedding (APE), Relative Position Embedding (RPE), and Rotary Position Embedding (RoPE). Therefore, we conduct an in-depth evaluation of the impact of position bias on the performance of LMs when fine-tuned on token…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
MethodsMulti-Head Attention · Absolute Position Encodings · Label Smoothing · Position-Wise Feed-Forward Layer · Softmax · Linear Layer · Byte Pair Encoding · Transformer · Adam · Layer Normalization
