Fusing Bidirectional Chains of Thought and Reward Mechanisms A Method for Enhancing Question-Answering Capabilities of Large Language Models for Chinese Intangible Cultural Heritage

Ruilin Liu; Zhixiao Zhao; Jieqiong Li; Chang Liu; Dongbo Wang

arXiv:2505.08167·cs.CL·June 11, 2025

Fusing Bidirectional Chains of Thought and Reward Mechanisms A Method for Enhancing Question-Answering Capabilities of Large Language Models for Chinese Intangible Cultural Heritage

Ruilin Liu, Zhixiao Zhao, Jieqiong Li, Chang Liu, Dongbo Wang

PDF

Open Access

TL;DR

This paper introduces a novel training method combining bidirectional chains of thought and a reward mechanism to improve question-answering accuracy of large language models specialized in Chinese intangible cultural heritage, addressing bias and knowledge retention issues.

Contribution

The paper presents a new training approach that integrates reverse reasoning and reward-based optimization to enhance domain-specific LLM performance, demonstrating improved accuracy and generalizability across multiple fields.

Findings

01

Outperforms existing methods in accuracy, Bleu-4, and Rouge-L scores.

02

Effective in reducing bias and catastrophic forgetting.

03

Applicable across various domains and advanced models.

Abstract

The rapid development of large language models (LLMs) has provided significant support and opportunities for the advancement of domain-specific LLMs. However, fine-tuning these large models using Intangible Cultural Heritage (ICH) data inevitably faces challenges such as bias, incorrect knowledge inheritance, and catastrophic forgetting. To address these issues, we propose a novel training method that integrates a bidirectional chains of thought and a reward mechanism. This method is built upon ICH-Qwen, a large language model specifically designed for the field of intangible cultural heritage. The proposed method enables the model to not only perform forward reasoning but also enhances the accuracy of the generated answers by utilizing reverse questioning and reverse reasoning to activate the model's latent knowledge. Additionally, a reward mechanism is introduced during training to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Graph Neural Networks · Multimodal Machine Learning Applications