Daily and Weekly Periodicity in Large Language Model Performance and Its Implications for Research
Paul Tschisgale, Peter Wulff

TL;DR
This study reveals that GPT-4o's performance varies periodically over daily and weekly cycles, challenging the assumption that LLM performance is stable over time in research settings.
Contribution
The paper provides the first longitudinal analysis demonstrating significant daily and weekly performance fluctuations in GPT-4o, highlighting the need to consider temporal variability in LLM research.
Findings
Spectral analysis shows 20% of performance variance is due to periodic patterns.
Performance exhibits daily and weekly rhythms.
Results challenge the assumption of time-invariant LLM performance.
Abstract
Large language models (LLMs) are increasingly used in research as both tools and objects of study. Much of this work assumes that LLM performance under fixed conditions (identical model snapshot, hyperparameters, and prompt) is time-invariant, meaning that average output quality remains stable over time; otherwise, reliability and reproducibility would be compromised. To test the assumption of time invariance, we conducted a longitudinal study of GPT-4o's average performance under fixed conditions. The LLM was queried to solve the same physics task ten times every three hours over approximately three months. Spectral (Fourier) analysis of the resulting time series revealed substantial periodic variability, accounting for about 20% of total variance. The observed periodic patterns are consistent with interacting daily and weekly rhythms. These findings challenge the assumption of time…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
