Zero and Few-shot Semantic Parsing with Ambiguous Inputs

Elias Stengel-Eskin; Kyle Rawlins; Benjamin Van Durme

arXiv:2306.00824·cs.CL·January 23, 2024·2 cites

Zero and Few-shot Semantic Parsing with Ambiguous Inputs

Elias Stengel-Eskin, Kyle Rawlins, Benjamin Van Durme

PDF

Open Access 1 Repo 1 Video 3 Reviews

TL;DR

This paper introduces AmP, a framework and dataset for translating ambiguous natural language into formal representations, revealing that large models struggle with ambiguity unless explicitly instructed, highlighting the need for explicit ambiguity modeling.

Contribution

The paper presents AmP, a novel dataset and challenge for handling ambiguity in semantic parsing, and evaluates how models manage ambiguous inputs with new metrics.

Findings

01

Large pre-trained models perform poorly without explicit instruction.

02

Models capture meaning distribution well when ambiguity is in inputs.

03

Including ambiguity explicitly improves model understanding and evaluation.

Abstract

Despite the frequent challenges posed by ambiguity when representing meaning via natural language, it is often ignored or deliberately removed in tasks mapping language to formally-designed representations, which generally assume a one-to-one mapping between linguistic and formal representations. We attempt to address this shortcoming by introducing AmP, a framework, dataset, and challenge for translating ambiguous natural language to formal representations like logic and code. We define templates and generate data for five well-documented linguistic ambiguities. Using AmP, we investigate how several few-shot text-to-code systems handle ambiguity, introducing three new metrics. We find that large pre-trained models perform poorly at capturing the distribution of possible meanings without deliberate instruction. However, models are able to capture the distribution well when ambiguity is…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 8· accept, good paperConfidence 4

Strengths

1. The AMP dataset is a significant contribution, providing a resource specifically designed for investigating ambiguity in semantic parsing, which is a relatively unexplored area. 2. The paper takes a comprehensive approach by addressing the challenge from the perspective of both dataset creation and model evaluation. 3. The introduction of zero-shot and few-shot tasks offers a rigorous evaluation framework for future research on ambiguity in semantic parsing. 4. The development of new metri

Weaknesses

1. While the paper provides a strong foundation, it could benefit from a more detailed exploration of how ambiguity affects real-world applications of semantic parsing. 2. The AMP dataset, while novel, might still be limited in scope and diversity, potentially affecting the robustness of the study’s conclusions. 3. It is unclear how the proposed methods deal with the dynamic nature of conversational context, which can significantly affect ambiguity resolution.

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

* The paper aims at an important problem (handling ambiguity in semantic parsing). * The setup is clever and allows for some interesting analyses. I think the looking at the token-level confidences to see how model uncertainty is reflected in ambiguity-resolution–dependent choice points, as done in Figure 5, is a useful idea. * The comparison to human behavior (Section 3.2) is interesting and I imagine could seed future experiments.

Weaknesses

I'm worried about how we assign meaning to the various results, and I'm not sure how this result would feed into future work that helps parsers handle ambiguity better. 1. Human experiments: humans were given both interpretations and asked to assign confidences to them. This seems a bit different from what the models were asked to do in the zero-shot experiments, which is implicitly pick out the ambiguity on their own. I understand it'd be hard to elicit this kind of behavior from humans — idea

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 2

Strengths

1. The motivation and writing are very clear. The paper is generally easy to follow. 2. I like the human probability vs. model probability experiments personally, and seeing that humans have certain preferences on one interpretation than the other is interesting, and model prediction somehow matches it as well is very interesting too.

Weaknesses

1. The generation task is hard, especially generating logical forms. Why not formulate this as a multi-choice problem? Letting the model choose two from 10 possible combinations? 2. Is there any quantitative analysis? What kind of errors does the model usually make? 3. The evaluation metric can be improved. I have several questions about this. Why not use the same zero-shot and few-shot metric since the output format is the same? Why not use language interpretation instead of LF generations? La

Code & Models

Repositories

esteng/ambiguous_parsing
noneOfficial

Videos

Zero and Few-shot Semantic Parsing with Ambiguous Inputs· slideslive

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications