Topic-Conversation Relevance (TCR) Dataset and Benchmarks
Yaran Fan, Jamie Pool, Senja Filipi, Ross Cutler

TL;DR
The paper introduces the TCR dataset, a large, diverse collection of meeting transcripts and topics, along with benchmarks using GPT-4 to assess conversation relevance for improving meeting effectiveness.
Contribution
It provides a comprehensive, multi-domain dataset and benchmarks for evaluating conversation relevance in meetings, facilitating research to enhance meeting quality.
Findings
TCR dataset contains 1,500 meetings and 22 million words.
Open-source tools for synthetic and augmented meetings are provided.
GPT-4 benchmarks demonstrate the dataset's utility for relevance assessment.
Abstract
Workplace meetings are vital to organizational collaboration, yet a large percentage of meetings are rated as ineffective. To help improve meeting effectiveness by understanding if the conversation is on topic, we create a comprehensive Topic-Conversation Relevance (TCR) dataset that covers a variety of domains and meeting styles. The TCR dataset includes 1,500 unique meetings, 22 million words in transcripts, and over 15,000 meeting topics, sourced from both newly collected Speech Interruption Meeting (SIM) data and existing public datasets. Along with the text data, we also open source scripts to generate synthetic meetings or create augmented meetings from the TCR dataset to enhance data diversity. For each data source, benchmarks are created using GPT-4 to evaluate the model accuracy in understanding transcription-topic relevance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Advanced Text Analysis Techniques · Sentiment Analysis and Opinion Mining
MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Position-Wise Feed-Forward Layer · Adam · Multi-Head Attention · Residual Connection · Byte Pair Encoding · Dropout · Absolute Position Encodings
