DiNeR: a Large Realistic Dataset for Evaluating Compositional Generalization
Chengang Hu, Xiao Liu, Yansong Feng

TL;DR
The paper introduces DiNeR, a large realistic Chinese dataset for evaluating compositional generalization through dish name recognition, addressing limitations of synthetic datasets by including diverse linguistic phenomena and real-world complexity.
Contribution
It presents a new large-scale, realistic dataset for dish name recognition that captures diverse linguistic phenomena, along with baseline models and insights into compositional generalization.
Findings
Baseline models show varying performance on the dataset.
The dataset includes complex linguistic phenomena like anaphora and ambiguity.
DiNeR provides a challenging benchmark for compositional generalization.
Abstract
Most of the existing compositional generalization datasets are synthetically-generated, resulting in a lack of natural language variation. While there have been recent attempts to introduce non-synthetic datasets for compositional generalization, they suffer from either limited data scale or a lack of diversity in the forms of combinations. To better investigate compositional generalization with more linguistic phenomena and compositional diversity, we propose the DIsh NamE Recognition (DiNeR) task and create a large realistic Chinese dataset. Given a recipe instruction, models are required to recognize the dish name composed of diverse combinations of food, actions, and flavors. Our dataset consists of 3,811 dishes and 228,114 recipes, and involves plenty of linguistic phenomena such as anaphora, omission and ambiguity. We provide two strong baselines based on T5 and large language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeochemistry and Geologic Mapping · Hydrocarbon exploration and reservoir analysis
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Byte Pair Encoding · SentencePiece · Layer Normalization · Gated Linear Unit · Attention Dropout · Linear Layer · Multi-Head Attention · Dropout
