Injecting Knowledge from Social Science Journals to Improve Indonesian Cultural Understanding by LLMs
Adimulya Kartiyasa, Bao Gia Cao, Boyang Li

TL;DR
This paper introduces IndoSoSci, a new dataset from Indonesian social science journals, and demonstrates how integrating this knowledge with retrieval-augmented generation significantly enhances LLMs' understanding of Indonesian culture, setting new performance benchmarks.
Contribution
The paper presents a novel dataset IndoSoSci and a retrieval-augmented method for injecting social science knowledge into LLMs to improve cultural understanding.
Findings
Significant performance improvements on the IndoCulture benchmark.
Combining IndoSoSci with Wikipedia achieves state-of-the-art accuracy.
The proposed method effectively injects cultural knowledge into LLMs.
Abstract
Recently there have been intensifying efforts to improve the understanding of Indonesian cultures by large language models (LLMs). An attractive source of cultural knowledge that has been largely overlooked is local journals of social science, which likely contain substantial cultural studies from a native perspective. We present a novel text dataset of journal article passages, created from 151 open-source Indonesian social science journals, called IndoSoSci. We demonstrate an effective recipe for injecting Indonesian cultural knowledge therein into LLMs: extracting the facts related to Indonesian culture, and apply retrieval-augmented generation (RAG) with LLM-generated hypothetical documents as queries during retrieval. The proposed recipe yields strong performance gains over several strong baselines on the IndoCulture benchmark. Additionally, by combining IndoSoSci with Indonesian…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Language and cultural evolution · Sentiment Analysis and Opinion Mining
