MeetBench-XL: Calibrated Multi-Dimensional Evaluation and Learned Dual-Policy Agents for Real-Time Meetings

Yuelin Hu; Jun Xu; Bingcong Lu; Zhengxue Cheng; Hongwei Hu; Ronghua Wu; Li Song

arXiv:2602.03285·cs.AI·February 4, 2026

MeetBench-XL: Calibrated Multi-Dimensional Evaluation and Learned Dual-Policy Agents for Real-Time Meetings

Yuelin Hu, Jun Xu, Bingcong Lu, Zhengxue Cheng, Hongwei Hu, Ronghua Wu, Li Song

PDF

Open Access 1 Video

TL;DR

MeetBench-XL introduces a comprehensive, calibrated evaluation framework and a dual-policy AI agent for enterprise meetings, addressing real-world complexities and improving task handling in live, multi-stakeholder environments.

Contribution

This work presents a new bilingual, multimodal enterprise meeting dataset, a multi-dimensional evaluation protocol aligned with human judgment, and a dual-policy agent optimizing reasoning paths and tool use.

Findings

01

MeetAll dataset covers 140 hours of enterprise meetings with validated questions.

02

MeetBench XL evaluation correlates well with human judgment across multiple metrics.

03

MeetMaster XL outperforms single-model baselines in real-world deployment scenarios.

Abstract

Enterprise meeting environments require AI assistants that handle diverse operational tasks, from rapid fact checking during live discussions to cross meeting analysis for strategic planning, under strict latency, cost, and privacy constraints. Existing meeting benchmarks mainly focus on simplified question answering and fail to reflect real world enterprise workflows, where queries arise organically from multi stakeholder collaboration, span long temporal contexts, and require tool augmented reasoning. We address this gap through a grounded dataset and a learned agent framework. First, we introduce MeetAll, a bilingual and multimodal corpus derived from 231 enterprise meetings totaling 140 hours. Questions are injected using an enterprise informed protocol validated by domain expert review and human discriminability studies. Unlike purely synthetic benchmarks, this protocol is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

MeetBench-XL: Calibrated Multi-Dimensional Evaluation and Learned Dual-Policy Agents for Real-Time Meetings· underline

Taxonomy

TopicsExpert finding and Q&A systems · Topic Modeling · Advanced Graph Neural Networks