ALPET: Active Few-shot Learning for Citation Worthiness Detection in Low-Resource Wikipedia Languages
Aida Halitaj, Arkaitz Zubiaga

TL;DR
ALPET is a novel framework combining active learning and pattern-exploiting training that significantly improves citation worthiness detection in low-resource Wikipedia languages with minimal labeled data.
Contribution
It introduces ALPET, a new approach that enhances CWD performance in low-resource languages using active learning and PET, reducing labeling effort.
Findings
ALPET outperforms the CCW baseline in Catalan, Basque, and Albanian datasets.
Performance plateaus after 300 labeled samples, indicating efficiency in low-resource scenarios.
Random sampling remains a strong baseline despite advanced query strategies.
Abstract
Citation Worthiness Detection (CWD) consists in determining which sentences, within an article or collection, should be backed up with a citation to validate the information it provides. This study, introduces ALPET, a framework combining Active Learning (AL) and Pattern-Exploiting Training (PET), to enhance CWD for languages with limited data resources. Applied to Catalan, Basque, and Albanian Wikipedia datasets, ALPET outperforms the existing CCW baseline while reducing the amount of labeled data in some cases above 80\%. ALPET's performance plateaus after 300 labeled samples, showing it suitability for low-resource scenarios where large, labeled datasets are not common. While specific active learning query strategies, like those employing K-Means clustering, can offer advantages, their effectiveness is not universal and often yields marginal gains over random sampling, particularly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification · Wikis in Education and Collaboration · Natural Language Processing Techniques
