TL;DR
POQue introduces a novel dataset and method for understanding complex events by asking participant-specific outcome questions, enabling deeper semantic analysis and revealing gaps in current language models' comprehension.
Contribution
The paper presents a new dataset and multi-step annotation interface for participant-specific outcome questions in complex events, improving understanding and model evaluation.
Findings
High inter-annotator agreement (0.74-0.96 Fleiss Kappa)
Current language models underperform compared to humans on the task
Dataset enables exploration of multiple aspects of event understanding
Abstract
Knowledge about outcomes is critical for complex event understanding but is hard to acquire. We show that by pre-identifying a participant in a complex event, crowd workers are able to (1) infer the collective impact of salient events that make up the situation, (2) annotate the volitional engagement of participants in causing the situation, and (3) ground the outcome of the situation in state changes of the participants. By creating a multi-step interface and a careful quality control strategy, we collect a high quality annotated dataset of 8K short newswire narratives and ROCStories with high inter-annotator agreement (0.74-0.96 weighted Fleiss Kappa). Our dataset, POQue (Participant Outcome Questions), enables the exploration and development of models that address multiple aspects of semantic understanding. Experimentally, we show that current language models lag behind human…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
