MTP: A Dataset for Multi-Modal Turning Points in Casual Conversations

Gia-Bao Dinh Ho; Chang Wei Tan; Zahra Zamanzadeh Darban; Mahsa Salehi,; Gholamreza Haffari; Wray Buntine

arXiv:2409.14801·cs.CL·September 24, 2024

MTP: A Dataset for Multi-Modal Turning Points in Casual Conversations

Gia-Bao Dinh Ho, Chang Wei Tan, Zahra Zamanzadeh Darban, Mahsa Salehi,, Gholamreza Haffari, Wray Buntine

PDF

Open Access 1 Datasets 1 Video

TL;DR

This paper introduces MTP, a multi-modal dataset for identifying turning points in conversations, and presents TPMaven, a framework that effectively detects and classifies these critical moments using vision-language models.

Contribution

The work provides a new dataset with precise annotations of conversational turning points and a novel framework leveraging advanced models for detection and explanation.

Findings

01

TPMaven achieves an F1-score of 0.88 in classification.

02

The dataset includes high-consensus, multi-modal annotations.

03

Explanations generated align well with human judgments.

Abstract

Detecting critical moments, such as emotional outbursts or changes in decisions during conversations, is crucial for understanding shifts in human behavior and their consequences. Our work introduces a novel problem setting focusing on these moments as turning points (TPs), accompanied by a meticulously curated, high-consensus, human-annotated multi-modal dataset. We provide precise timestamps, descriptions, and visual-textual evidence high-lighting changes in emotions, behaviors, perspectives, and decisions at these turning points. We also propose a framework, TPMaven, utilizing state-of-the-art vision-language models to construct a narrative from the videos and large language models to classify and detect turning points in our multi-modal dataset. Evaluation results show that TPMaven achieves an F1-score of 0.88 in classification and 0.61 in detection, with additional explanations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

giaabaoo/MTP_Dataset
dataset· 4 dl
4 dl

Videos

MTP: A Dataset for Multi-Modal Turning Points in Casual Conversations· underline

Taxonomy

TopicsNatural Language Processing Techniques