Understanding and Tackling Label Errors in Individual-Level Nature   Language Understanding

Yunpeng Xiao; Youpeng Zhao; Kai Shu

arXiv:2502.13297·cs.CL·February 20, 2025

Understanding and Tackling Label Errors in Individual-Level Nature Language Understanding

Yunpeng Xiao, Youpeng Zhao, Kai Shu

PDF

Open Access 1 Repo

TL;DR

This paper highlights the importance of considering individual-level factors in natural language understanding tasks, introduces a new annotation guideline, and demonstrates improved accuracy with large language models on re-annotated datasets.

Contribution

It proposes a novel annotation guideline incorporating individual factors for more accurate dataset creation in individual-level NLU tasks.

Findings

01

Error rates in datasets were as high as 31.7% and 23.3%.

02

Large language models achieved over 87% accuracy on re-annotated datasets.

03

Adding individual factors improves model performance and annotation quality.

Abstract

Natural language understanding (NLU) is a task that enables machines to understand human language. Some tasks, such as stance detection and sentiment analysis, are closely related to individual subjective perspectives, thus termed individual-level NLU. Previously, these tasks are often simplified to text-level NLU tasks, ignoring individual factors. This not only makes inference difficult and unexplainable but often results in a large number of label errors when creating datasets. To address the above limitations, we propose a new NLU annotation guideline based on individual-level factors. Specifically, we incorporate other posts by the same individual and then annotate individual subjective perspectives after considering all individual posts. We use this guideline to expand and re-annotate the stance detection and topic-based sentiment analysis datasets. We find that error rates in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

24yearsoldstudent/individual-nlu
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsChild and Animal Learning Development · Design Education and Practice · Multi-Criteria Decision Making