Temporal Attention Augmented Transformer Hawkes Process
Lu-ning Zhang, Jian-wei Liu, Zhi-yan Song, Xin Zuo

TL;DR
This paper introduces TAA-THP, a novel Transformer-based Hawkes process that effectively incorporates temporal information into the attention mechanism, leading to significant performance improvements in modeling asynchronous event sequences.
Contribution
The paper proposes a new Temporal Attention Augmented Transformer Hawkes Process that integrates temporal encoding into the attention structure, addressing limitations of previous models.
Findings
Significant improvement in log-likelihood on test datasets.
Enhanced prediction accuracy of event types and occurrence times.
Ablation studies confirm the effectiveness of temporal attention.
Abstract
In recent years, mining the knowledge from asynchronous sequences by Hawkes process is a subject worthy of continued attention, and Hawkes processes based on the neural network have gradually become the most hotly researched fields, especially based on the recurrence neural network (RNN). However, these models still contain some inherent shortcomings of RNN, such as vanishing and exploding gradient and long-term dependency problems. Meanwhile, Transformer based on self-attention has achieved great success in sequential modeling like text processing and speech recognition. Although the Transformer Hawkes process (THP) has gained huge performance improvement, THPs do not effectively utilize the temporal information in the asynchronous events, for these asynchronous sequences, the event occurrence instants are as important as the types of events, while conventional THPs simply convert…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPoint processes and geometric inequalities · Hedgehog Signaling Pathway Studies · Diffusion and Search Dynamics
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Label Smoothing · Absolute Position Encodings · Residual Connection · Dropout · Softmax · Byte Pair Encoding · Dense Connections
