Lessons from Computational Modelling of Reference Production in Mandarin   and English

Guanyi Chen; Kees van Deemter

arXiv:2011.07398·cs.CL·August 17, 2021

Lessons from Computational Modelling of Reference Production in Mandarin and English

Guanyi Chen, Kees van Deemter

PDF

TL;DR

This paper evaluates and analyzes computational models of referring expression generation in Mandarin and English, revealing higher under-specification rates than previously reported and highlighting language-specific challenges.

Contribution

It provides annotated Mandarin RE corpus, evaluates classic REG algorithms on it, and compares findings with English, uncovering new insights into under-specification issues.

Findings

01

Higher under-specification rates in Mandarin and English than previously reported

02

Identified grammar-related issues affecting REG performance

03

Highlighted shortcomings in previous REG evaluation methods

Abstract

Referring expression generation (REG) algorithms offer computational models of the production of referring expressions. In earlier work, a corpus of referring expressions (REs) in Mandarin was introduced. In the present paper, we annotate this corpus, evaluate classic REG algorithms on it, and compare the results with earlier results on the evaluation of REG for English referring expressions. Next, we offer an in-depth analysis of the corpus, focusing on issues that arise from the grammar of Mandarin. We discuss shortcomings of previous REG evaluations that came to light during our investigation and we highlight some surprising results. Perhaps most strikingly, we found a much higher proportion of under-specified expressions than previous studies had suggested, not just in Mandarin but in English as well.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.