TPP-SD: Accelerating Transformer Point Process Sampling with Speculative Decoding
Shukai Gong, Yiyang Fu, Fengyuan Ran, Quyu Kong, Feng Zhou

TL;DR
TPP-SD introduces a speculative decoding-based framework to significantly accelerate Transformer temporal point process sampling while maintaining the same output distribution, enabling faster sequence generation in practical applications.
Contribution
It adapts speculative decoding techniques from language models to TPP sampling, achieving 2-6x speedup without altering the output distribution.
Findings
Achieves 2-6x speedup in sampling
Maintains identical output distribution as standard methods
Effective across synthetic and real datasets
Abstract
We propose TPP-SD, a novel approach that accelerates Transformer temporal point process (TPP) sampling by adapting speculative decoding (SD) techniques from language models. By identifying the structural similarities between thinning algorithms for TPPs and speculative decoding for language models, we develop an efficient sampling framework that leverages a smaller draft model to generate multiple candidate events, which are then verified by the larger target model in parallel. TPP-SD maintains the same output distribution as autoregressive sampling while achieving significant acceleration. Experiments on both synthetic and real datasets demonstrate that our approach produces samples from identical distributions as standard methods, but with 2-6 speedup. Our ablation studies analyze the impact of hyperparameters such as draft length and draft model size on sampling efficiency.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMedical Image Segmentation Techniques · Machine Learning in Materials Science · Medical Imaging Techniques and Applications
