# Evaluation and Improvement of Chatbot Text Classification Data Quality   Using Plausible Negative Examples

**Authors:** Kit Kuksenok, Andriy Martyniv

arXiv: 1906.01910 · 2019-06-06

## TL;DR

This paper introduces nex-cv, a practical and understandable metric for evaluating and improving chatbot text classification datasets using plausible negative examples, validated across multiple recruitment datasets in English and German.

## Contribution

The paper presents nex-cv, a novel, model-agnostic evaluation metric that leverages negative examples to improve small, unbalanced chatbot datasets, suitable for non-developer stakeholders.

## Key findings

- nex-cv correlates well with human ratings
- it improves classifier performance on recruitment datasets
- validated across seven datasets in two languages

## Abstract

We describe and validate a metric for estimating multi-class classifier performance based on cross-validation and adapted for improvement of small, unbalanced natural-language datasets used in chatbot design. Our experiences draw upon building recruitment chatbots that mediate communication between job-seekers and recruiters by exposing the ML/NLP dataset to the recruiting team. Evaluation approaches must be understandable to various stakeholders, and useful for improving chatbot performance. The metric, nex-cv, uses negative examples in the evaluation of text classification, and fulfils three requirements. First, it is actionable: it can be used by non-developer staff. Second, it is not overly optimistic compared to human ratings, making it a fast method for comparing classifiers. Third, it allows model-agnostic comparison, making it useful for comparing systems despite implementation differences. We validate the metric based on seven recruitment-domain datasets in English and German over the course of one year.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.01910/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/1906.01910/full.md

## References

20 references — full list in the complete paper: https://tomesphere.com/paper/1906.01910/full.md

---
Source: https://tomesphere.com/paper/1906.01910