TL;DR
This paper develops targeted test suites to evaluate whether neural language models encode various aspects of discourse and dialogue coherence beyond simple sentence order, revealing their strengths and limitations.
Contribution
It introduces an extendable evaluation framework for coherence in language models, focusing on linguistic devices beyond syntax and sentence order.
Findings
Neural models encode some coherence constraints but lack others.
The evaluation framework allows detailed analysis of coherence aspects.
Models show varying performance across different coherence phenomena.
Abstract
Coherent discourse is distinguished from a mere collection of utterances by the satisfaction of a diverse set of constraints, for example choice of expression, logical relation between denoted events, and implicit compatibility with world-knowledge. Do neural language models encode such constraints? We design an extendable set of test suites addressing different aspects of discourse and dialogue coherence. Unlike most previous coherence evaluation studies, we address specific linguistic devices beyond sentence order perturbations, allowing for a more fine-grained analysis of what constitutes coherence and what neural models trained on a language modelling objective do encode. Extending the targeted evaluation paradigm for neural language models (Marvin and Linzen, 2018) to phenomena beyond syntax, we show that this paradigm is equally suited to evaluate linguistic qualities that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
