Towards Cross-modal Retrieval in Chinese Cultural Heritage Documents: Dataset and Solution
Junyi Yuan, Jian Zhang, Fangyu Wu, Dongming Lu, Huanda Lu, Qiufeng Wang

TL;DR
This paper introduces CulTi, a specialized dataset for Chinese cultural heritage, and proposes LACLIP, a local alignment method that improves cross-modal retrieval of intricate visual and textual Chinese heritage data.
Contribution
The paper provides the first dedicated Chinese cultural heritage dataset CulTi and develops LACLIP, a novel training-free local alignment strategy for enhanced cross-modal retrieval.
Findings
LACLIP outperforms existing models in cross-modal retrieval accuracy.
CulTi dataset presents unique challenges due to intricate visual-textual alignment.
LACLIP effectively handles fine-grained semantic associations in Chinese heritage data.
Abstract
China has a long and rich history, encompassing a vast cultural heritage that includes diverse multimodal information, such as silk patterns, Dunhuang murals, and their associated historical narratives. Cross-modal retrieval plays a pivotal role in understanding and interpreting Chinese cultural heritage by bridging visual and textual modalities to enable accurate text-to-image and image-to-text retrieval. However, despite the growing interest in multimodal research, there is a lack of specialized datasets dedicated to Chinese cultural heritage, limiting the development and evaluation of cross-modal learning models in this domain. To address this gap, we propose a multimodal dataset named CulTi, which contains 5,726 image-text pairs extracted from two series of professional documents, respectively related to ancient Chinese silk and Dunhuang murals. Compared to existing general-domain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques
