To Aggregate or Not to Aggregate. That is the Question: A Case Study on   Annotation Subjectivity in Span Prediction

Kemal Kurniawan; Meladel Mistica; Timothy Baldwin; and Jey Han Lau

arXiv:2408.02257·cs.CL·August 6, 2024

To Aggregate or Not to Aggregate. That is the Question: A Case Study on Annotation Subjectivity in Span Prediction

Kemal Kurniawan, Meladel Mistica, Timothy Baldwin, and Jey Han Lau

PDF

Open Access 1 Repo

TL;DR

This study investigates the impact of annotation subjectivity on span prediction in legal texts, demonstrating that training on consensus annotations yields better performance than using individual annotations.

Contribution

It provides a case study on how annotation subjectivity affects span prediction in legal NLP and shows the benefits of aggregating annotations.

Findings

01

Training on majority-voted spans improves accuracy.

02

Subjectivity in annotations influences model performance.

03

Aggregation of annotations enhances prediction reliability.

Abstract

This paper explores the task of automatic prediction of text spans in a legal problem description that support a legal area label. We use a corpus of problem descriptions written by laypeople in English that is annotated by practising lawyers. Inherent subjectivity exists in our task because legal area categorisation is a complex task, and lawyers often have different views on a problem, especially in the face of legally-imprecise descriptions of issues. Experiments show that training on majority-voted spans outperforms training on disaggregated ones.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kmkurn/wassa2024
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Mining Algorithms and Applications · Natural Language Processing Techniques