Multi-party Goal Tracking with LLMs: Comparing Pre-training,   Fine-tuning, and Prompt Engineering

Angus Addlesee; Weronika Siei\'nska; Nancie Gunson; Daniel Hern\'andez; Garcia; Christian Dondrup; Oliver Lemon

arXiv:2308.15231·cs.CL·August 30, 2023·1 cites

Multi-party Goal Tracking with LLMs: Comparing Pre-training, Fine-tuning, and Prompt Engineering

Angus Addlesee, Weronika Siei\'nska, Nancie Gunson, Daniel Hern\'andez, Garcia, Christian Dondrup, Oliver Lemon

PDF

Open Access 1 Repo

TL;DR

This study assesses how well current Large Language Models can understand multi-party goal-oriented conversations, comparing fine-tuning, pre-training, and prompt engineering, with GPT-3.5-turbo showing notable effectiveness in limited-data scenarios.

Contribution

It introduces a novel multi-party conversation dataset and systematically compares different LLM approaches for goal tracking and intent recognition in MPCs.

Findings

01

GPT-3.5-turbo outperforms fine-tuned models in few-shot settings.

02

Reasoning prompts achieve the highest accuracy in goal and intent recognition.

03

Multi-party conversations remain challenging for current LLMs.

Abstract

This paper evaluates the extent to which current Large Language Models (LLMs) can capture task-oriented multi-party conversations (MPCs). We have recorded and transcribed 29 MPCs between patients, their companions, and a social robot in a hospital. We then annotated this corpus for multi-party goal-tracking and intent-slot recognition. People share goals, answer each other's goals, and provide other people's goals in MPCs - none of which occur in dyadic interactions. To understand user goals in MPCs, we compared three methods in zero-shot and few-shot settings: we fine-tuned T5, created pre-training tasks to train DialogLM using LED, and employed prompt engineering techniques with GPT-3.5-turbo, to determine which approach can complete this novel task with limited data. GPT-3.5-turbo significantly outperformed the others in a few-shot setting. The `reasoning' style prompt, when given 7%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

addleseehq/mpgt-eval
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text Readability and Simplification · Artificial Intelligence in Healthcare and Education

MethodsAttention Is All You Need · None · {Dispute@FaQ-s}How to file a dispute with Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Linear Layer · Layer Normalization · Refunds@Expedia|||How do I get a full refund from Expedia? · Cosine Annealing · Weight Decay · Linear Warmup With Cosine Annealing