Text Characterization Toolkit
Daniel Simig, Tianlu Wang, Verna Dankers, Peter Henderson,, Khuyagbaatar Batsuren, Dieuwke Hupkes, Mona Diab

TL;DR
The paper introduces a toolkit for in-depth analysis of NLP datasets and models, addressing biases and artifacts often overlooked by standard performance metrics.
Contribution
It provides an accessible annotation tool and analysis scripts to study dataset properties and their impact on model behavior, promoting deeper evaluation practices.
Findings
Identification of dataset biases and heuristics
Analysis of difficult examples for models
Case studies across three domains
Abstract
In NLP, models are usually evaluated by reporting single-number performance scores on a number of readily available benchmarks, without much deeper analysis. Here, we argue that - especially given the well-known fact that benchmarks often contain biases, artefacts, and spurious correlations - deeper results analysis should become the de-facto standard when presenting new models or benchmarks. We present a tool that researchers can use to study properties of the dataset and the influence of those properties on their models' behaviour. Our Text Characterization Toolkit includes both an easy-to-use annotation tool, as well as off-the-shelf scripts that can be used for specific analyses. We also present use-cases from three different domains: we use the tool to predict what are difficult examples for given well-known trained models and identify (potentially harmful) biases and heuristics…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Software Engineering Research · Natural Language Processing Techniques
