Investigating Low-Cost LLM Annotation for~Spoken Dialogue Understanding Datasets
Lucas Druart (LIA), Valentin Vielzeuf, Yannick Est\`eve (LIA)

TL;DR
This paper explores cost-effective methods for annotating spoken dialogue datasets using large language models, aiming to improve semantic representations crucial for task-oriented dialogue systems.
Contribution
It evaluates the effectiveness of fine-tuning large language models for automatic annotation and discusses semi-automatic annotation implications.
Findings
Fine-tuning LLMs enhances semantic annotation quality.
Automatic annotations capture relevant domain knowledge.
Semi-automatic approaches offer promising annotation efficiency.
Abstract
In spoken Task-Oriented Dialogue (TOD) systems, the choice of the semantic representation describing the users' requests is key to a smooth interaction. Indeed, the system uses this representation to reason over a database and its domain knowledge to choose its next action. The dialogue course thus depends on the information provided by this semantic representation. While textual datasets provide fine-grained semantic representations, spoken dialogue datasets fall behind. This paper provides insights into automatic enhancement of spoken dialogue datasets' semantic representations. Our contributions are three fold: (1) assess the relevance of Large Language Model fine-tuning, (2) evaluate the knowledge captured by the produced annotations and (3) highlight semi-automatic annotation implications.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
