Beyond Semantics: How Temporal Biases Shape Retrieval in Transformer and State-Space Models
Anooshka Bajaj, Deven Mahesh Mistry, Sahaj Singh Maini, Yash Aggarwal, Zoran Tiganj

TL;DR
This paper investigates how large language models, including transformer and state-space types, exhibit temporal biases in retrieving information, revealing biases similar to human episodic memory and their implications for in-context learning.
Contribution
It introduces a novel experimental framework to isolate and analyze temporal biases in LLMs, demonstrating their presence across different architectures and their relation to episodic memory mechanisms.
Findings
Models favor tokens following repeated cues, especially near sequence edges.
Temporal biases are linked to induction heads in transformers.
Memory retrieval is less reliable for information embedded in the middle of prompts.
Abstract
In-context learning is governed by both temporal and semantic relationships, shaping how Large Language Models (LLMs) retrieve contextual information. Analogous to human episodic memory, where the retrieval of specific events is enabled by separating events that happened at different times, this work probes the ability of various pretrained LLMs, including transformer and state-space models, to differentiate and retrieve temporally separated events. Specifically, we prompted models with sequences containing multiple presentations of the same token, which reappears at the sequence end. By fixing the positions of these repeated tokens and permuting all others, we removed semantic confounds and isolated temporal effects on next-token prediction. Across diverse sequences, models consistently placed the highest probabilities on tokens following a repeated token, but with a notable bias for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
