Transformer Neural Processes - Kernel Regression
Daniel Jenson, Jhonathan Navott, Mengyan Zhang, Makkunda Sharma, Elizaveta Semenova, Seth Flaxman

TL;DR
The paper introduces TNP-KR, a scalable Transformer Neural Process variant with novel attention mechanisms, enabling efficient inference on large datasets and outperforming existing methods on various benchmarks.
Contribution
It proposes TNP-KR with kernel-based attention and two new attention mechanisms, improving scalability and performance over prior Neural Processes.
Findings
TNP-KR handles 100K context points and 1M test points in under a minute on a single GPU.
TNP-KR with DKA outperforms Performer on most benchmarks.
TNP-KR with SA achieves state-of-the-art results.
Abstract
Neural Processes (NPs) are a rapidly evolving class of models designed to directly model the posterior predictive distribution of stochastic processes. Originally developed as a scalable alternative to Gaussian Processes (GPs), which are limited by runtime complexity, the most accurate modern NPs can often rival GPs but still suffer from an bottleneck due to their attention mechanism. We introduce the Transformer Neural Process - Kernel Regression (TNP-KR), a scalable NP featuring: (1) a Kernel Regression Block (KRBlock), a simple, extensible, and parameter efficient transformer block with complexity , where and are the number of context and test points, respectively; (2) a kernel-based attention bias; and (3) two novel attention mechanisms: scan attention (SA), a memory-efficient scan-based attention that when paired with a kernel-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
