Mamba Hawkes Process
Anningzhe Gao, Shan Dai, Yan Hu

TL;DR
The paper introduces the Mamba Hawkes Process, a novel model leveraging state space architecture to better capture long-range dependencies and dynamic interactions in event sequences, outperforming existing models.
Contribution
It presents the Mamba Hawkes Process and its extension, combining state space models with Hawkes processes, and provides theoretical and empirical evidence of improved performance.
Findings
MHP outperforms existing models on various datasets.
MHP-E enhances predictive capabilities by combining Mamba and Transformer architectures.
Theoretical analysis shows synergy between state space models and Hawkes processes.
Abstract
Irregular and asynchronous event sequences are prevalent in many domains, such as social media, finance, and healthcare. Traditional temporal point processes (TPPs), like Hawkes processes, often struggle to model mutual inhibition and nonlinearity effectively. While recent neural network models, including RNNs and Transformers, address some of these issues, they still face challenges with long-term dependencies and computational efficiency. In this paper, we introduce the Mamba Hawkes Process (MHP), which leverages the Mamba state space architecture to capture long-range dependencies and dynamic event interactions. Our results show that MHP outperforms existing models across various datasets. Additionally, we propose the Mamba Hawkes Process Extension (MHP-E), which combines Mamba and Transformer models to enhance predictive capabilities. We present the novel application of the Mamba…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPoint processes and geometric inequalities
MethodsLinear Layer · Multi-Head Attention · Attention Is All You Need · Softmax · Byte Pair Encoding · Layer Normalization · Label Smoothing · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Adam
