TL;DR
This paper shows that using the categorical distribution as a neural network output is effective for predicting both discrete and continuous event sequences, with applications in retinal prosthetics and synthetic data testing.
Contribution
It introduces the categorical distribution as a versatile output for neural networks in event prediction and presents new datasets for evaluating model scalability and discrete event times.
Findings
Categorical distribution performs well for continuous-time event modeling.
Discrete-time event prediction benefits from a new neuronal spike task.
New synthetic datasets enable testing larger models.
Abstract
We demonstrate the effectiveness of the categorical distribution as a neural network output for next event prediction. This is done for both discrete-time and continuous-time event sequences. To model continuous-time processes, the categorical distribution is interpreted as a piecewise-constant density function and is shown to be competitive across a range of datasets. We then argue for the importance of studying discrete-time processes by introducing a neuronal spike prediction task motivated by retinal prosthetics, where discretization of event times is consequent on the task description. Separately, we show evidence that commonly used datasets favour smaller models. Finally, we introduce new synthetic datasets for testing larger models, as well as synthetic datasets with discrete event times.
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
- Conceptual novelty: The paper introduces an interesting idea to represent inter-event times via categorical distributions instead of traditional continuous outputs. It provides a fresh and intuitive perspective on temporal point process modeling by focusing on probability mass rather than density. - Comprehensive empirical evaluation: The authors conduct extensive experiments across a wide range of real-world and synthetic datasets. The results consistently demonstrate that categorical outputs
- Lack of focus and narrative coherence: The paper attempts to address multiple directions — categorical outputs, dataset scaling, and new synthetic processes. However, these threads are not well integrated. As a result, the core contribution (the categorical output formulation) is diluted by other sections, such as Metropolis Lognormal and Modulo Addition, which contribute little to the central research question. - Insufficient theoretical depth: The paper’s arguments about regularization effec
This is mainly an experimental paper that seeks to demonstrate scenarios in which a discrete distribution is useful to characterize event distributions and the circumstances under which it is expected to perform better than a continuous specification. For the most part, the experiments are well motivated and justified, and the experiments are convincing and sufficiently detailed in the supplementary material.
The main weakness of the paper is that it seems to show two fairly evident phenomena event prediction, namely that in general, given sufficient sample size continuous, discrete and implicit (via sampling) event distribution estimates perform similarly and well specified models (meaning those that match the underlying process) perform better. This naturally does not take away the effort put by the authors in demonstrating it empirically. Although the introduced artificial datasets help illustrat
- Focus on a practically motivated issue: real event-time data often contains discrete patterns (e.g., NYC taxi), where density-based likelihoods may be unstable. - Clear and intuitive modeling approach. - Contribution of new datasets may facilitate future research.
- Theoretical formulation lacks rigor and clarity. - The hybrid model combining discrete time bins with continuous likelihoods is not formally defined, leaving ambiguity in the underlying probability space. - The suitability of the negative log-likelihood objective is questionable in this mixed setting. For example, in the NYT dataset, likelihood can be artificially inflated by collapsing predictive variance toward zero at discrete timestamps, making likelihood comparisons unreliable. - Tabl
The main strength is it provides a different angle to look at next event prediction. The authors have conducted many experiments to support their claim as suggested by the title CATEGORICAL DISTRIBUTIONS ARE EFFECTIVE NEURAL NETWORK OUTPUTS FOR EVENT PREDICTION.
originality : it is okay, but it can be improved with some theoretical insight of the gain of NLL (or LL) with the categorical distribution. Quality: The authors should include other evaluation metric such as MAE, MSE than NLL in the figure 3. For Table 1: I am not sure if the proposed method is better. I also think readers will appreciate more if the authors elucidate with some theoretical underpinnings on the threshold of the LL increases as training set size increases in Figure 3. Clarit
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
