A Preliminary Study of RAG for Taiwanese Historical Archives
Claire Lin, Bo-Han Feng, Xuanjun Chen, Te-Lun Yang, Hung-yi Lee, Jyh-Shing Roger Jang

TL;DR
This study explores the application of Retrieval-Augmented Generation (RAG) to Taiwanese historical archives, analyzing how metadata integration affects retrieval and answer accuracy, and identifying challenges like hallucinations and complex query handling.
Contribution
It provides an initial investigation into RAG for Taiwanese historical datasets, highlighting the impact of metadata strategies and outlining persistent challenges in the domain.
Findings
Metadata integration improves retrieval and answer accuracy.
RAG systems face hallucination issues during generation.
Handling temporal and multi-hop queries remains challenging.
Abstract
Retrieval-Augmented Generation (RAG) has emerged as a promising approach for knowledge-intensive tasks. However, few studies have examined RAG for Taiwanese Historical Archives. In this paper, we present an initial study of a RAG pipeline applied to two historical Traditional Chinese datasets, Fort Zeelandia and the Taiwan Provincial Council Gazette, along with their corresponding open-ended query sets. We systematically investigate the effects of query characteristics and metadata integration strategies on retrieval quality, answer generation, and the performance of the overall system. The results show that early-stage metadata integration enhances both retrieval and answer accuracy while also revealing persistent challenges for RAG systems, including hallucinations during generation and difficulties in handling temporal or multi-hop historical queries.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Topic Modeling · Advanced Graph Neural Networks
