Benchmarking zero-shot and few-shot approaches for tokenization,   tagging, and dependency parsing of Tagalog text

Angelina Aquino; Franz de Leon

arXiv:2208.01814·cs.CL·January 9, 2023

Benchmarking zero-shot and few-shot approaches for tokenization, tagging, and dependency parsing of Tagalog text

Angelina Aquino, Franz de Leon

PDF

Open Access

TL;DR

This paper evaluates zero-shot and few-shot methods for tokenization, tagging, and dependency parsing of Tagalog, demonstrating significant improvements over supervised baselines in low-resource scenarios.

Contribution

It introduces and benchmarks zero-shot and few-shot approaches for Tagalog grammatical analysis, addressing the lack of annotated data.

Findings

01

Zero-shot and few-shot methods outperform supervised baselines.

02

Data augmentation and word embeddings enhance low-resource performance.

03

Approaches work well on both in-domain and out-of-domain text.

Abstract

The grammatical analysis of texts in any written language typically involves a number of basic processing tasks, such as tokenization, morphological tagging, and dependency parsing. State-of-the-art systems can achieve high accuracy on these tasks for languages with large datasets, but yield poor results for languages which have little to no annotated data. To address this issue for the Tagalog language, we investigate the use of alternative language resources for creating task-specific models in the absence of dependency-annotated Tagalog data. We also explore the use of word embeddings and data augmentation to improve performance when only a small amount of annotated Tagalog data is available. We show that these zero-shot and few-shot approaches yield substantial improvements on grammatical analysis of both in-domain and out-of-domain Tagalog text compared to state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification