What are the Essential Factors in Crafting Effective Long Context Multi-Hop Instruction Datasets? Insights and Best Practices
Zhi Chen, Qiguang Chen, Libo Qin, Qipeng Guo, Haijun Lv, Yicheng Zou, Wanxiang Che, Hang Yan, Kai Chen, Dahua Lin

TL;DR
This paper introduces the MIMG framework to generate high-quality, multi-hop instruction data for long context tasks, significantly improving model performance over existing synthetic data methods.
Contribution
The paper presents the Multi-agent Interactive Multi-hop Generation (MIMG) framework, enhancing synthetic data quality for long context multi-hop tasks and systematically analyzing data generation strategies.
Findings
High-quality, multi-hop data exceeds 85% in the proposed framework.
Synthetic data can outperform models trained on larger human-annotated datasets.
The MIMG framework improves long context understanding in language models.
Abstract
Recent advancements in large language models (LLMs) with extended context windows have significantly improved tasks such as information extraction, question answering, and complex planning scenarios. In order to achieve success in long context tasks, a large amount of work has been done to enhance the long context capabilities of the model through synthetic data. Existing methods typically utilize the Self-Instruct framework to generate instruction tuning data for better long context capability improvement. However, our preliminary experiments indicate that less than 35% of generated samples are multi-hop, and more than 40% exhibit poor quality, limiting comprehensive understanding and further research. To improve the quality of synthetic data, we propose the Multi-agent Interactive Multi-hop Generation (MIMG) framework, incorporating a Quality Verification Agent, a Single-hop Question…
Peer Reviews
Decision·Submitted to ICLR 2025
1. The motivation of this paper is clear. 2. The exploration of methods within each agent module of the framework is thorough.
1. The paper contains some errors; for example, Figure 10 shows only one image but is labeled (a). 2. While the authors have explored methods within each agent module of the proposed framework to enhance data generation quality, there is a lack of ablation studies between the agents, making it unclear which agent contributes the most. 3. The experiments are not sufficiently generalized, as they were only evaluated on InternLM. I believe validation on widely used models like the LLaMA series is n
1. Compared to previous multi-hop data generation methods like Self-Instruct, the MIMG framework significantly enhances the proportion of multi-hop data, as well as the diversity and quality of the data. 2. The authors conduct a thorough analysis of various potentially impactful strategies, such as document selection strategies and the impact of question merging methods. This provides practical references for future research endeavors. 3. The synthesized long context dataset (LongMIT) effectivel
1. Although the author provides a detailed analysis of the impact of different strategies on the multi-hop data ratio, quality, or diversity in various components, they do not analyze **the impact of these components on the final performance**. Specifically, the roles of the Quality Verification Agent, Single-hop Question Generation Agent, Multiple Question Sampling, and Multi-hop Question Merger Agent in the final framework are not discussed. Analyzing these would help demonstrate the independe
The main strengths of this paper include: (1). Innovative Multi-agent Generation Framework: The proposed Multi-agent Interactive Multi-hop Generation (MIMG) framework incorporates multiple agents (Quality Verification Agent, Single-hop Question Generation Agent, Multiple Question Sampling Strategy, and Multi-hop Question Merging Agent), significantly improving the quality and diversity of generated data. (2). Extensive Experimental Validation: The paper systematically investigates various d
The main limitations of this paper are: 1). The primary weakness of this paper lies in its limited novelty. The contributions primarily emphasize engineering implementations and optimizations rather than presenting groundbreaking theoretical or methodological advancements. While the proposed framework demonstrates effective improvements in long-context, multi-hop instruction datasets, it largely builds upon existing concepts and technologies in a structured engineering fashion. 2). Limited
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInnovative Teaching Methods
