KaWAT: A Word Analogy Task Dataset for Indonesian
Kemal Kurniawan

TL;DR
KaWAT is a new Indonesian word analogy dataset that helps evaluate and improve pretrained Indonesian word embeddings, showing their effectiveness in downstream tasks.
Contribution
The paper introduces KaWAT, the first comprehensive Indonesian word analogy dataset, and evaluates various embeddings, demonstrating their benefits in downstream applications.
Findings
Pretrained embeddings improve downstream task performance.
Embeddings trained on news corpus outperform others.
Using KaWAT aids in evaluating Indonesian word embeddings.
Abstract
We introduced KaWAT (Kata Word Analogy Task), a new word analogy task dataset for Indonesian. We evaluated on it several existing pretrained Indonesian word embeddings and embeddings trained on Indonesian online news corpus. We also tested them on two downstream tasks and found that pretrained word embeddings helped either by reducing the training epochs or yielding significant performance gains.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Sentiment Analysis and Opinion Mining
