Reducing Distraction in Long-Context Language Models by Focused Learning

Zijun Wu; Bingyuan Liu; Ran Yan; Lei Chen; Thomas Delteil

arXiv:2411.05928·cs.CL·November 12, 2024

Reducing Distraction in Long-Context Language Models by Focused Learning

Zijun Wu, Bingyuan Liu, Ran Yan, Lei Chen, Thomas Delteil

PDF

Open Access

TL;DR

This paper introduces a new training approach combining retrieval-based data augmentation and contrastive learning to improve long-context language models' focus on relevant information, enhancing their performance on QA tasks.

Contribution

It presents a novel training method that improves LLMs' ability to focus on relevant long-context information using retrieval and contrastive learning techniques.

Findings

01

Enhanced performance on long-document QA benchmarks

02

Improved focus on relevant context segments

03

Effective reduction of distraction in long contexts

Abstract

Recent advancements in Large Language Models (LLMs) have significantly enhanced their capacity to process long contexts. However, effectively utilizing this long context remains a challenge due to the issue of distraction, where irrelevant information dominates lengthy contexts, causing LLMs to lose focus on the most relevant segments. To address this, we propose a novel training method that enhances LLMs' ability to discern relevant information through a unique combination of retrieval-based data augmentation and contrastive learning. Specifically, during fine-tuning with long contexts, we employ a retriever to extract the most relevant segments, serving as augmented inputs. We then introduce an auxiliary contrastive learning objective to explicitly ensure that outputs from the original context and the retrieved sub-context are closely aligned. Extensive experiments on long…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling

MethodsContrastive Learning · Focus