# Towards Universal Dialogue Act Tagging for Task-Oriented Dialogues

**Authors:** Shachi Paul, Rahul Goel, Dilek Hakkani-T\"ur

arXiv: 1907.03020 · 2019-07-09

## TL;DR

This paper introduces a universal dialogue act schema for task-oriented dialogues, aligning existing datasets to train a universal tagger that effectively labels human-human conversations with minimal manual annotation.

## Contribution

The paper proposes a universal dialogue act schema and methods to align diverse datasets, enabling training of a universal tagger for human-human task-oriented dialogues.

## Key findings

- Achieved 54.1% F1 score in unsupervised setting
- Improved to 57.7% F1 with semi-supervised learning
- Reduced manual annotation effort by at least 1.7K turns

## Abstract

Machine learning approaches for building task-oriented dialogue systems require large conversational datasets with labels to train on. We are interested in building task-oriented dialogue systems from human-human conversations, which may be available in ample amounts in existing customer care center logs or can be collected from crowd workers. Annotating these datasets can be prohibitively expensive. Recently multiple annotated task-oriented human-machine dialogue datasets have been released, however their annotation schema varies across different collections, even for well-defined categories such as dialogue acts (DAs). We propose a Universal DA schema for task-oriented dialogues and align existing annotated datasets with our schema. Our aim is to train a Universal DA tagger (U-DAT) for task-oriented dialogues and use it for tagging human-human conversations. We investigate multiple datasets, propose manual and automated approaches for aligning the different schema, and present results on a target corpus of human-human dialogues. In unsupervised learning experiments we achieve an F1 score of 54.1% on system turns in human-human dialogues. In a semi-supervised setup, the F1 score increases to 57.7% which would otherwise require at least 1.7K manually annotated turns. For new domains, we show further improvements when unlabeled or labeled target domain data is available.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.03020/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1907.03020/full.md

## References

25 references — full list in the complete paper: https://tomesphere.com/paper/1907.03020/full.md

---
Source: https://tomesphere.com/paper/1907.03020