Temporal Attention Augmented Transformer Hawkes Process

Lu-ning Zhang; Jian-wei Liu; Zhi-yan Song; Xin Zuo

arXiv:2112.14472·cs.LG·December 30, 2021·1 cites

Temporal Attention Augmented Transformer Hawkes Process

Lu-ning Zhang, Jian-wei Liu, Zhi-yan Song, Xin Zuo

PDF

Open Access

TL;DR

This paper introduces TAA-THP, a novel Transformer-based Hawkes process that effectively incorporates temporal information into the attention mechanism, leading to significant performance improvements in modeling asynchronous event sequences.

Contribution

The paper proposes a new Temporal Attention Augmented Transformer Hawkes Process that integrates temporal encoding into the attention structure, addressing limitations of previous models.

Findings

01

Significant improvement in log-likelihood on test datasets.

02

Enhanced prediction accuracy of event types and occurrence times.

03

Ablation studies confirm the effectiveness of temporal attention.

Abstract

In recent years, mining the knowledge from asynchronous sequences by Hawkes process is a subject worthy of continued attention, and Hawkes processes based on the neural network have gradually become the most hotly researched fields, especially based on the recurrence neural network (RNN). However, these models still contain some inherent shortcomings of RNN, such as vanishing and exploding gradient and long-term dependency problems. Meanwhile, Transformer based on self-attention has achieved great success in sequential modeling like text processing and speech recognition. Although the Transformer Hawkes process (THP) has gained huge performance improvement, THPs do not effectively utilize the temporal information in the asynchronous events, for these asynchronous sequences, the event occurrence instants are as important as the types of events, while conventional THPs simply convert…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPoint processes and geometric inequalities · Hedgehog Signaling Pathway Studies · Diffusion and Search Dynamics

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Label Smoothing · Absolute Position Encodings · Residual Connection · Dropout · Softmax · Byte Pair Encoding · Dense Connections