IndoBERT-Relevancy: A Context-Conditioned Relevancy Classifier for Indonesian Text

Muhammad Apriandito Arya Saputra; Andry Alamsyah; Dian Puteri Ramadhani; Thomhert Suprapto Siadari; and Hanif Fakhrurroja

arXiv:2603.26095·cs.CL·March 30, 2026

IndoBERT-Relevancy: A Context-Conditioned Relevancy Classifier for Indonesian Text

Muhammad Apriandito Arya Saputra, Andry Alamsyah, Dian Puteri Ramadhani, Thomhert Suprapto Siadari, and Hanif Fakhrurroja

PDF

1 Repo

TL;DR

IndoBERT-Relevancy is a new context-conditioned relevancy classifier for Indonesian text, trained on a large dataset, achieving high accuracy and F1 score, and publicly available on HuggingFace.

Contribution

It introduces IndoBERT-Relevancy, a novel relevancy classifier for Indonesian, with a large dataset and an iterative data construction process to improve robustness.

Findings

01

Achieved an F1 score of 0.948

02

Achieved an accuracy of 96.5%

03

Effectively handles both formal and informal Indonesian text

Abstract

Determining whether a piece of text is relevant to a given topic is a fundamental task in natural language processing, yet it remains largely unexplored for Bahasa Indonesia. Unlike sentiment analysis or named entity recognition, relevancy classification requires the model to reason about the relationship between two inputs simultaneously: a topical context and a candidate text. We introduce IndoBERT-Relevancy, a context-conditioned relevancy classifier built on IndoBERT Large (335M parameters) and trained on a novel dataset of 31,360 labeled pairs spanning 188 topics. Through an iterative, failure-driven data construction process, we demonstrate that no single data source is sufficient for robust relevancy classification, and that targeted synthetic data can effectively address specific model weaknesses. Our final model achieves an F1 score of 0.948 and an accuracy of 96.5%, handling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://huggingface.co
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.