RoTHP: Rotary Position Embedding-based Transformer Hawkes Process
Anningzhe Gao, Shan Dai

TL;DR
This paper introduces RoTHP, a novel Transformer Hawkes Process using rotary position embeddings, which improves sequence prediction flexibility and robustness to temporal noise in modeling asynchronous event data.
Contribution
The paper proposes a rotary position embedding-based architecture for Transformer Hawkes Processes, enhancing sequence prediction and robustness to temporal variations.
Findings
RoTHP exhibits better generalization to timestamp translations.
RoTHP demonstrates improved sequence prediction accuracy.
Theoretical analysis confirms translation invariance and flexibility.
Abstract
Temporal Point Processes (TPPs), especially Hawkes Process are commonly used for modeling asynchronous event sequences data such as financial transactions and user behaviors in social networks. Due to the strong fitting ability of neural networks, various neural Temporal Point Processes are proposed, among which the Neural Hawkes Processes based on self-attention such as Transformer Hawkes Process (THP) achieve distinct performance improvement. Although the THP has gained increasing studies, it still suffers from the {sequence prediction issue}, i.e., training on history sequences and inferencing about the future, which is a prevalent paradigm in realistic sequence analysis tasks. What's more, conventional THP and its variants simply adopt initial sinusoid embedding in transformers, which shows performance sensitivity to temporal change or noise in sequence data analysis by our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPoint processes and geometric inequalities
MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Dense Connections · Position-Wise Feed-Forward Layer · Dropout · Label Smoothing · Residual Connection · Absolute Position Encodings · Byte Pair Encoding
