Enhancing elusive clues in knowledge learning by contrasting attention   of language models

Jian Gao; Xiao Zhang; Ji Wu; Miao Li

arXiv:2409.17954·cs.AI·March 13, 2025

Enhancing elusive clues in knowledge learning by contrasting attention of language models

Jian Gao, Xiao Zhang, Ji Wu, Miao Li

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a method to improve knowledge learning in language models by contrasting attention patterns of different-sized models to identify and emphasize elusive clues, leading to better memorization and learning efficiency.

Contribution

The paper proposes a novel approach that leverages attention contrast between large and small models to enhance learning from subtle clues in training data.

Findings

01

Larger models focus more on non-obvious clues.

02

Contrasting attention helps identify important but overlooked clues.

03

Token-dropout guided by clues improves memorization performance.

Abstract

Causal language models acquire vast amount of knowledge from general text corpus during pretraining, but the efficiency of knowledge learning is known to be unsatisfactory, especially when learning from knowledge-dense and small-sized corpora. The deficiency can come from long-distance dependencies which are hard to capture by language models, and overfitting to co-occurrence patterns and distracting clues in the training text. To address these issues, the paper proposes a method to enhance knowledge learning during language model pretraining, by enhancing elusive but important clues in text discovered by the language model themselves. We found that larger language models pay more attention to non-obvious but important clues, which are often overlooked by smaller language models. Therefore, we can identify these clues by contrasting the attention weights of large and small language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hushes-minutes/contrasting_attention
pytorchOfficial

Videos

Enhancing Elusive Clues in Knowledge Learning by Contrasting Attention of Language Models· underline

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Innovative Teaching and Learning Methods

MethodsSoftmax · Attention Is All You Need