Abstract2Appendix: Academic Reviews Enhance LLM Long-Context Capabilities
Shengzhi Li, Kittipat Kampa, Rongyu Lin, Bohang Li, Shichao Pei

TL;DR
This paper demonstrates that fine-tuning large language models with high-quality academic peer review data using DPO significantly improves their long-context reading abilities, outperforming other methods and emphasizing the value of human reviews.
Contribution
It introduces the use of high-quality academic reviews and DPO for fine-tuning LLMs, showing superior performance in long-context tasks and benchmark results.
Findings
DPO outperforms SFT in data efficiency and effectiveness.
Fine-tuning with 2000 samples yields notable improvements.
High-quality human reviews are preferred over LLM responses even for advanced models.
Abstract
Large language models (LLMs) have shown remarkable performance across various tasks, yet their ability to handle long-context reading remains challenging. This study explores the effectiveness of leveraging high-quality academic peer review data for fine-tuning LLMs to enhance their long-context capabilities. We compare the Direct Preference Optimization (DPO) method with the Supervised Fine-Tuning (SFT) method, demonstrating DPO's superiority and data efficiency. Our experiments show that the fine-tuned model achieves a 4.04-point improvement over phi-3 and a 2.6\% increase on the Qasper benchmark using only 2000 samples. Despite facing limitations in data scale and processing costs, this study underscores the potential of DPO and high-quality data in advancing LLM performance. Additionally, the zero-shot benchmark results indicate that aggregated high-quality human reviews are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies
MethodsDirect Preference Optimization
