Rethinking Random Transformers as Adaptive Sequence Smoothers for Sleep Staging
Guisong Liu,Xin Gao,Martin Dresler,Jiansong Zhang,Pengfei Wei

TL;DR
This paper demonstrates that randomly initialized Transformers inherently act as adaptive smoothers for sleep staging, challenging the need for training to capture complex dependencies.
Contribution
It introduces the concept that random self-attention functions as an effective smoothing mechanism, emphasizing architecture bias over learned parameters in sleep staging.
Findings
Random Transformers outperform heuristic smoothing in sleep staging.
The Random Attention Prior Kernel explains the smoothing effect of random self-attention.
Most performance improvements stem from architectural bias, not training.
Abstract
Automatic sleep staging commonly adopts Transformers under the assumption that they learn complex long-range dependencies. We challenge this view by revealing a neglected property of sleep sequences: strong local temporal continuity. We show that a randomly initialized Transformer, without any training, substantially improves sleep staging performance and consistently outperforms heuristic smoothing. We formalize this effect via a Random Attention Prior Kernel (RAPK), showing that random self-attention acts as an adaptive smoother by balancing global averaging and content-based similarity while preserving stage transitions. Using two metrics, the Local Smoothness Influence Index (LSII) and the Weighted Transition Entropy (WTE), we provide evidence that most performance gains in Transformer-based sleep staging arise from architectural inductive bias rather than parameter learning. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
