Hyper Hawkes Processes: Interpretable Models of Marked Temporal Point Processes
Alex Boyd, Andrew Warrington, Taha Kass-Hout, Parminder Bhatia, Danica Xiao

TL;DR
The paper introduces hyper Hawkes processes, a new family of marked temporal point process models that combine high predictive performance with interpretability by extending classical Hawkes processes with latent spaces and hypernetworks.
Contribution
It presents the hyper Hawkes process (HHP), a novel model family that enhances interpretability and flexibility while achieving state-of-the-art performance in benchmark tasks.
Findings
Achieves state-of-the-art performance on benchmark tasks.
Retains interpretability through linearity and structure of the original Hawkes process.
Provides tools for inspecting and understanding model predictions.
Abstract
Foundational marked temporal point process (MTPP) models, such as the Hawkes process, often use inexpressive model families in order to offer interpretable parameterizations of event data. On the other hand, neural MTPPs models forego this interpretability in favor of absolute predictive performance. In this work, we present a new family MTPP models: the hyper Hawkes process (HHP), which aims to be as flexible and performant as neural MTPPs, while retaining interpretable aspects. To achieve this, the HHP extends the classical Hawkes process to increase its expressivity by first expanding the dimension of the process into a latent space, and then introducing a hypernetwork to allow time- and data-dependent dynamics. These extensions define a highly performant MTPP family, achieving state-of-the-art performance across a range of benchmark tasks and metrics. Furthermore, by retaining the…
Peer Reviews
Decision·Submitted to ICLR 2026
The proposed model is well justified and described in enough detail in the main paper to give a reader a sense of the mechanism driving the model, and unlike existing models that use a representation model for the history of events, the proposed model does it in a manner that is interpretable, i.e., the contribution of individual marks to the estimated intensities can be also estimated. Moreover, as the authors point out, the "transition" operator is both expressive and efficient due to the use
The main weakness of the proposed model lies on the experimental evaluation of the proposed model. Specifically, the proposed model has the overall best rank (Table 1), but is only better than the competing methods in half of the metrics. However, the bigger issue is that the metrics do not account for variation, thus it is very difficult to assess the significance of the results, so in that sense it may be possible that the difference between HHP and DHLP is not at all significant. Moreover, th
- The paper is well-written and easy to follow. - Results on seven datasets for next-event prediction demonstrate log-likelihood performance that is competitive with baseline methods. - The paper explores the interpretability of the proposed approach by adopting event-level attribution techniques.
- The paper appears to be a straightforward extension of the previously proposed DLHP approach with minimal modifications, namely, using a single-layer latent space and a hypernetwork to estimate event-specific decay rates. - Given the limited technical contributions, the experimental results are underwhelming: 1) Although the paper claims state-of-the-art performance, the reported log-likelihood appears comparable to baseline methods. 2) Table 1: It is unclear why raw event accuracy metrics a
- Adapting neural networks to classical methods improves expressiveness but can make the model more of a “black box.” The authors address this by using eigenvector decomposition to keep the model interpretable, offering a clear particle-based view and attribution. - They also overcome the common trade-off between performance and interpretability. By increasing the latent dimension, the model becomes more expressive and reduces this trade-off. - One interesting observation in this work is that th
- Minor. The visualization of particle attribution in Figure 2 is difficult to interpret. Distinguishing sample lines using a dotted style or alternative markers could improve clarity. - The model increases the latent dimension and adds more architectural components, which may lead to longer runtime. It is unclear whether the runtime is comparable to baseline methods.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPoint processes and geometric inequalities · Random Matrices and Applications · Geometric Analysis and Curvature Flows
