Building an Efficient Multilingual Non-Profit IR System for the Islamic Domain Leveraging Multiprocessing Design in Rust
Vera Pavlova, Mohammed Makhlouf

TL;DR
This paper presents a lightweight, multilingual IR system for Islamic texts, leveraging domain-specific pre-training, language reduction, and Rust-based multiprocessing to enable efficient, low-resource semantic search.
Contribution
It introduces a novel, resource-efficient multilingual IR system tailored for Islamic literature, combining domain adaptation, language reduction, and Rust multiprocessing for improved performance.
Findings
Superior retrieval performance over general models
Effective domain-specific pre-training and language reduction
Efficient implementation using Rust for low-resource environments
Abstract
The widespread use of large language models (LLMs) has dramatically improved many applications of Natural Language Processing (NLP), including Information Retrieval (IR). However, domains that are not driven by commercial interest often lag behind in benefiting from AI-powered solutions. One such area is religious and heritage corpora. Alongside similar domains, Islamic literature holds significant cultural value and is regularly utilized by scholars and the general public. Navigating this extensive amount of text is challenging, and there is currently no unified resource that allows for easy searching of this data using advanced AI tools. This work focuses on the development of a multilingual non-profit IR system for the Islamic domain. This process brings a few major challenges, such as preparing multilingual domain-specific corpora when data is limited in certain languages, deploying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsYeasts and Rust Fungi Studies · Power Line Communications and Noise · Banana Cultivation and Research
