Investigating Low-Cost LLM Annotation for~Spoken Dialogue Understanding   Datasets

Lucas Druart (LIA); Valentin Vielzeuf; Yannick Est\`eve (LIA)

arXiv:2406.13269·cs.AI·June 21, 2024

Investigating Low-Cost LLM Annotation for~Spoken Dialogue Understanding Datasets

Lucas Druart (LIA), Valentin Vielzeuf, Yannick Est\`eve (LIA)

PDF

TL;DR

This paper explores cost-effective methods for annotating spoken dialogue datasets using large language models, aiming to improve semantic representations crucial for task-oriented dialogue systems.

Contribution

It evaluates the effectiveness of fine-tuning large language models for automatic annotation and discusses semi-automatic annotation implications.

Findings

01

Fine-tuning LLMs enhances semantic annotation quality.

02

Automatic annotations capture relevant domain knowledge.

03

Semi-automatic approaches offer promising annotation efficiency.

Abstract

In spoken Task-Oriented Dialogue (TOD) systems, the choice of the semantic representation describing the users' requests is key to a smooth interaction. Indeed, the system uses this representation to reason over a database and its domain knowledge to choose its next action. The dialogue course thus depends on the information provided by this semantic representation. While textual datasets provide fine-grained semantic representations, spoken dialogue datasets fall behind. This paper provides insights into automatic enhancement of spoken dialogue datasets' semantic representations. Our contributions are three fold: (1) assess the relevance of Large Language Model fine-tuning, (2) evaluate the knowledge captured by the produced annotations and (3) highlight semi-automatic annotation implications.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.