TL;DR
This paper introduces a novel, fine-grained approach to evaluating meeting effectiveness using a new dataset and an LLM-based framework, capturing dynamic discussion quality over time.
Contribution
It presents the AMI-ME dataset, a temporal effectiveness evaluation framework with benchmarks, and demonstrates the use of LLMs for segment-wise meeting assessment.
Findings
The framework effectively assesses meeting segments with high correlation to human judgments.
The dataset enables detailed analysis of effectiveness across different meeting types.
End-to-end systems from speech to effectiveness scoring show promising results.
Abstract
Evaluating meeting effectiveness is crucial for improving organizational productivity. Current approaches rely on post-hoc surveys that yield a single coarse-grained score for an entire meeting. The reliance on manual assessment is inherently limited in scalability, cost, and reproducibility. Moreover, a single score fails to capture the dynamic nature of collaborative discussions. We propose a new paradigm for evaluating meeting effectiveness centered on novel criteria and temporal fine-grained approach. We define effectiveness as the rate of objective achievement over time and assess it for individual topical segments within a meeting. To support this task, we introduce the AMI Meeting Effectiveness (AMI-ME) dataset, a new meta-evaluation dataset containing 2,459 human-annotated segments from 130 AMI Corpus meetings. We also develop an automatic effectiveness evaluation framework that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
