Exploring Large Language Models for Relevance Judgments in Tetun
Gabriel de Jesus, S\'ergio Nunes

TL;DR
This study investigates the use of large language models to automate relevance judgments in information retrieval for the low-resource language Tetun, comparing model outputs to human annotations.
Contribution
It demonstrates the feasibility of using LLMs for relevance assessment in low-resource languages, expanding automation possibilities beyond high-resource contexts.
Findings
LLMs produce relevance scores similar to human judgments.
Automated assessments show comparable inter-annotator agreement levels.
Results align with findings from high-resource language studies.
Abstract
The Cranfield paradigm has served as a foundational approach for developing test collections, with relevance judgments typically conducted by human assessors. However, the emergence of large language models (LLMs) has introduced new possibilities for automating these tasks. This paper explores the feasibility of using LLMs to automate relevance assessments, particularly within the context of low-resource languages. In our study, LLMs are employed to automate relevance judgment tasks, by providing a series of query-document pairs in Tetun as the input text. The models are tasked with assigning relevance scores to each pair, where these scores are then compared to those from human annotators to evaluate the inter-annotator agreement levels. Our investigation reveals results that align closely with those reported in studies of high-resource languages.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Language and cultural evolution · Speech and dialogue systems
MethodsALIGN
