Assessing Data Efficiency in Task-Oriented Semantic Parsing
Shrey Desai, Akshat Shrivastava, Justin Rill, Brian Moran, Safiyyah, Saleem, Alexander Zotov, Ahmed Aly

TL;DR
This paper proposes a four-stage protocol to measure data efficiency in task-oriented semantic parsing, enabling practitioners to determine how much in-domain data is needed to reach desired performance levels.
Contribution
It introduces a unified, practical protocol for assessing data efficiency in semantic parsing, applicable across different models and domains.
Findings
Protocol effectively measures data requirements for semantic parsers.
Demonstrated in case studies on model generalizability and intent complexity.
Provides a tool for practitioners to optimize data collection efforts.
Abstract
Data efficiency, despite being an attractive characteristic, is often challenging to measure and optimize for in task-oriented semantic parsing; unlike exact match, it can require both model- and domain-specific setups, which have, historically, varied widely across experiments. In our work, as a step towards providing a unified solution to data-efficiency-related questions, we introduce a four-stage protocol which gives an approximate measure of how much in-domain, "target" data a parser requires to achieve a certain quality bar. Specifically, our protocol consists of (1) sampling target subsets of different cardinalities, (2) fine-tuning parsers on each subset, (3) obtaining a smooth curve relating target subset (%) vs. exact match (%), and (4) referencing the curve to mine ad-hoc (target subset, exact match) points. We apply our protocol in two real-world case studies -- model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Software Engineering Research
