InDEX: Indonesian Idiom and Expression Dataset for Cloze Test
Xinying Qiu, Guofeng Shi

TL;DR
This paper introduces InDEX, a large Indonesian idiom and expression dataset for cloze tests, and explores effective modeling strategies combining static and contextual embeddings to improve comprehension accuracy.
Contribution
It provides a new Indonesian idiom dataset for cloze tests and investigates novel embedding combination methods for better model performance.
Findings
Combining definition embeddings with random initialization improves model accuracy for idioms.
Static embeddings suffice for fixed expressions without special meanings.
The dataset enables more accurate idiom comprehension modeling.
Abstract
We propose InDEX, an Indonesian Idiom and Expression dataset for cloze test. The dataset contains 10438 unique sentences for 289 idioms and expressions for which we generate 15 different types of distractors, resulting in a large cloze-style corpus. Many baseline models of cloze test reading comprehension apply BERT with random initialization to learn embedding representation. But idioms and fixed expressions are different such that the literal meaning of the phrases may or may not be consistent with their contextual meaning. Therefore, we explore different ways to combine static and contextual representations for a stronger baseline model. Experimentations show that combining definition and random initialization will better support cloze test model performance for idioms whether independently or mixed with fixed expressions. While for fixed expressions with no special meaning, static…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Edcuational Technology Systems
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Test · Linear Layer · Weight Decay · Dense Connections · Residual Connection · Layer Normalization · WordPiece
