DENIAHL: In-Context Features Influence LLM Needle-In-A-Haystack Abilities
Hui Dai, Dan Pechi, Xinyi Yang, Garvit Banga, Raghav Mantri

TL;DR
This paper introduces DENIAHL, a synthetic benchmark to analyze factors affecting language models' ability to recall specific information from long contexts, revealing how features like data type and size influence performance.
Contribution
The study develops a systematic benchmark, DENIAHL, to evaluate how various features beyond context length impact LLMs' NIAH abilities, expanding previous research.
Findings
GPT-3.5 outperforms LLaMA 2-7B on DENIAHL
Recall performance drops with increased item size
Changing data type from numbers to letters affects recall
Abstract
The Needle-in-a-haystack (NIAH) test is a general task used to assess language models' (LMs') abilities to recall particular information from long input context. This framework however does not provide a means of analyzing what factors, beyond context length, contribute to LMs' abilities or inabilities to separate and recall needles from their haystacks. To provide a systematic means of assessing what features contribute to LMs' NIAH capabilities, we developed a synthetic benchmark called DENIAHL (Data-oriented Evaluation of NIAH for LLM's). Our work expands on previous NIAH studies by ablating NIAH features beyond typical context length including data type, size, and patterns. We find stark differences between GPT-3.5 and LLaMA 2-7B's performance on DENIAHL, and drops in recall performance when features like item size are increased, and to some degree when data type is changed from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Semantic Web and Ontologies
MethodsLinear Layer · Cosine Annealing · Multi-Head Attention · Byte Pair Encoding · Weight Decay · {Dispute@FaQ-s}How to file a dispute with Expedia? · Residual Connection · Attention Is All You Need · Softmax · Adam
