Perturbations and Subpopulations for Testing Robustness in Token-Based   Argument Unit Recognition

Jonathan Kamp; Lisa Beinborn; Antske Fokkens

arXiv:2209.14780·cs.CL·September 30, 2022·1 cites

Perturbations and Subpopulations for Testing Robustness in Token-Based Argument Unit Recognition

Jonathan Kamp, Lisa Beinborn, Antske Fokkens

PDF

Open Access 1 Repo

TL;DR

This paper investigates the robustness of token-based versus sentence-based argument unit recognition systems, finding token-based models are generally more resilient to perturbations and data subpopulations.

Contribution

It reproduces previous findings and introduces systematic tests to analyze behavioral differences, demonstrating token-based models' superior robustness.

Findings

01

Token-based models outperform sentence-based models on perturbed data.

02

Token-based systems are more robust across specific data subpopulations.

03

Systematic analysis reveals behavioral differences between the two approaches.

Abstract

Argument Unit Recognition and Classification aims at identifying argument units from text and classifying them as pro or against. One of the design choices that need to be made when developing systems for this task is what the unit of classification should be: segments of tokens or full sentences. Previous research suggests that fine-tuning language models on the token-level yields more robust results for classifying sentences compared to training on sentences directly. We reproduce the study that originally made this claim and further investigate what exactly token-based systems learned better compared to sentence-based ones. We develop systematic tests for analysing the behavioural differences between the token-based and the sentence-based system. Our results show that token-based models are generally more robust than sentence-based models both on manually perturbed examples and on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jbkamp/repo-rob-token-aur
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Topic Modeling · Natural Language Processing Techniques