Out of Thin Air: Is Zero-Shot Cross-Lingual Keyword Detection Better   Than Unsupervised?

Boshko Koloski; Senja Pollak; Bla\v{z} \v{S}krlj; Matej; Martinc

arXiv:2202.06650·cs.CL·February 15, 2022

Out of Thin Air: Is Zero-Shot Cross-Lingual Keyword Detection Better Than Unsupervised?

Boshko Koloski, Senja Pollak, Bla\v{z} \v{S}krlj, Matej, Martinc

PDF

Open Access

TL;DR

This study investigates whether zero-shot cross-lingual keyword extraction using pretrained multilingual models outperforms traditional unsupervised methods, especially for low-resource languages with no labeled data.

Contribution

It demonstrates that pretrained multilingual models fine-tuned on diverse languages outperform unsupervised keyword extractors in zero-shot settings across multiple languages.

Findings

01

Pretrained models outperform unsupervised methods in all tested languages.

02

Zero-shot cross-lingual models are effective for low-resource languages.

03

Fine-tuning on multilingual corpora enhances zero-shot keyword extraction performance.

Abstract

Keyword extraction is the task of retrieving words that are essential to the content of a given document. Researchers proposed various approaches to tackle this problem. At the top-most level, approaches are divided into ones that require training - supervised and ones that do not - unsupervised. In this study, we are interested in settings, where for a language under investigation, no training data is available. More specifically, we explore whether pretrained multilingual language models can be employed for zero-shot cross-lingual keyword extraction on low-resource languages with limited or no available labeled training data and whether they outperform state-of-the-art unsupervised keyword extractors. The comparison is conducted on six news article datasets covering two high-resource languages, English and Russian, and four low-resource languages, Croatian, Estonian, Latvian, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Text Analysis Techniques · Text and Document Classification Technologies · Topic Modeling