Raw Text is All you Need: Knowledge-intensive Multi-turn Instruction   Tuning for Large Language Model

Xia Hou; Qifeng Li; Jian Yang; Tongliang Li; Linzheng Chai; Xianjie; Wu; Hangyuan Ji; Zhoujun Li; Jixuan Nie; Jingbo Dun; Wenfeng Song

arXiv:2407.03040·cs.CL·July 4, 2024·1 cites

Raw Text is All you Need: Knowledge-intensive Multi-turn Instruction Tuning for Large Language Model

Xia Hou, Qifeng Li, Jian Yang, Tongliang Li, Linzheng Chai, Xianjie, Wu, Hangyuan Ji, Zhoujun Li, Jixuan Nie, Jingbo Dun, Wenfeng Song

PDF

Open Access

TL;DR

This paper introduces R2S, a framework that uses dialogue logic to generate knowledge-rich multi-turn dialogues from raw documents, improving large language models' instruction tuning across diverse domains.

Contribution

The paper presents a novel method for creating multi-turn dialogues from raw documents using dialogue logic, resulting in a new dataset and enhanced instruction tuning of LLMs.

Findings

01

Created G I NSTRUCT dataset with raw document knowledge

02

Fine-tuned GLLM to generate structured multi-turn dialogues

03

Improved LLM performance in knowledge-intensive tasks

Abstract

Instruction tuning as an effective technique aligns the outputs of large language models (LLMs) with human preference. But how to generate the seasonal multi-turn dialogues from raw documents for instruction tuning still requires further exploration. In this paper, we present a novel framework named R2S that leverages the CoD-Chain of Dialogue logic to guide large language models (LLMs) in generating knowledge-intensive multi-turn dialogues for instruction tuning. By integrating raw documents from both open-source datasets and domain-specific web-crawled documents into a benchmark K-BENCH, we cover diverse areas such as Wikipedia (English), Science (Chinese), and Artifacts (Chinese). Our approach first decides the logic flow of the current dialogue and then prompts LLMs to produce key phrases for sourcing relevant response content. This methodology enables the creation of the G I…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsShrink and Fine-Tune