Transformer Neural Processes - Kernel Regression

Daniel Jenson; Jhonathan Navott; Mengyan Zhang; Makkunda Sharma; Elizaveta Semenova; Seth Flaxman

arXiv:2411.12502·cs.LG·April 20, 2026

Transformer Neural Processes - Kernel Regression

Daniel Jenson, Jhonathan Navott, Mengyan Zhang, Makkunda Sharma, Elizaveta Semenova, Seth Flaxman

PDF

TL;DR

The paper introduces TNP-KR, a scalable Transformer Neural Process variant with novel attention mechanisms, enabling efficient inference on large datasets and outperforming existing methods on various benchmarks.

Contribution

It proposes TNP-KR with kernel-based attention and two new attention mechanisms, improving scalability and performance over prior Neural Processes.

Findings

01

TNP-KR handles 100K context points and 1M test points in under a minute on a single GPU.

02

TNP-KR with DKA outperforms Performer on most benchmarks.

03

TNP-KR with SA achieves state-of-the-art results.

Abstract

Neural Processes (NPs) are a rapidly evolving class of models designed to directly model the posterior predictive distribution of stochastic processes. Originally developed as a scalable alternative to Gaussian Processes (GPs), which are limited by $O (n^{3})$ runtime complexity, the most accurate modern NPs can often rival GPs but still suffer from an $O (n^{2})$ bottleneck due to their attention mechanism. We introduce the Transformer Neural Process - Kernel Regression (TNP-KR), a scalable NP featuring: (1) a Kernel Regression Block (KRBlock), a simple, extensible, and parameter efficient transformer block with complexity $O (n_{c}^{2} + n_{c} n_{t})$ , where $n_{c}$ and $n_{t}$ are the number of context and test points, respectively; (2) a kernel-based attention bias; and (3) two novel attention mechanisms: scan attention (SA), a memory-efficient scan-based attention that when paired with a kernel-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.