Speculative Sampling for Parametric Temporal Point Processes

Marin Bilo\v{s}; Anderson Schneider; Yuriy Nevmyvaka

arXiv:2510.20031·cs.LG·October 24, 2025

Speculative Sampling for Parametric Temporal Point Processes

Marin Bilo\v{s}, Anderson Schneider, Yuriy Nevmyvaka

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a rejection sampling algorithm for temporal point processes that allows for parallel, exact sampling of multiple future events, significantly improving efficiency without altering existing models.

Contribution

The authors propose a novel parallel sampling method for TPPs that maintains exactness and requires no retraining or architectural modifications.

Findings

01

Achieves empirical speedups on real-world datasets.

02

Provides theoretical guarantees for the sampling method.

03

Enables efficient large-scale event sequence generation.

Abstract

Temporal point processes are powerful generative models for event sequences that capture complex dependencies in time-series data. They are commonly specified using autoregressive models that learn the distribution of the next event from the previous events. This makes sampling inherently sequential, limiting efficiency. In this paper, we propose a novel algorithm based on rejection sampling that enables exact sampling of multiple future values from existing TPP models, in parallel, and without requiring any architectural changes or retraining. Besides theoretical guarantees, our method demonstrates empirical speedups on real-world datasets, bridging the gap between expressive modeling and efficient parallel generation for large-scale TPP applications.

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 2

Strengths

1. The paper identifies a relevant problem — the inefficiency of sequential sampling in autoregressive TPPs — and attempts to address it using ideas from speculative decoding. 2. The proposed algorithm is conceptually simple and can be implemented without retraining existing models. 3. Theoretical analysis (Section 3) is relatively clear, providing proofs for the bounding procedure.

Weaknesses

1. Limited Novelty and Conceptual Contribution: 1)The proposed approach is a straightforward adaptation of speculative decoding to temporal point processes. There is no fundamental theoretical or algorithmic innovation beyond applying rejection sampling to a different data modality. 2) The main “novelty” (deriving rejection constants for specific distributions) is technical but not conceptual — such derivations follow standard bounding techniques from rejection sampling literature. 2. Theoret

Reviewer 02Rating 6Confidence 4

Strengths

The paper is easy to follow, and its main idea is clear and straightforward. The paper demonstrates notable originality by addressing a core inefficiency in TPP sampling—sequential generation bottlenecks—through a novel speculative sampling framework that avoids modifying existing models. Unlike prior works (e.g., Gloeckle et al., 2024; Zeng et al., 2023) that require retraining TPPs to predict multiple steps, this method leverages pre-trained encoders’ proposal distributions to generate paralle

Weaknesses

The main weakness of this paper is its presentation. Some parts are not clear enough. See my comments in "Questions" section.

Reviewer 03Rating 6Confidence 2

Strengths

1. The work is well-motivated, addressing the critical bottleneck of inefficient sampling in autoregressive Temporal Point Processes (TPPs). Addresses this major limitation could have significant implication for real-time applications. 2. The paper's claims are supported by strong empirical results. The experiments demonstrate substantial efficiency improvements in wall-clock time compared to conventional sequential sampling, as presented in Table 3. Furthermore, the authors validate their clai

Weaknesses

1. A significant limitation of this approach is its reliance on a closed-form, parametric density function. Computing the rejection constant, particularly the general method in Section 3.2, requires the ability to evaluate both the target density and its derivative to find inflection points and construct the necessary linear bounds. This constraint makes the approach unapplicable to many intensity-based or diffusion-based TPP models. 2. The paper's claim in Section 3.4 that the hidden states $h

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTensor decomposition and applications · Generative Adversarial Networks and Image Synthesis · Point processes and geometric inequalities