Disentangled Interleaving Variational Encoding
Noelle Y. L. Wong, Eng Yeow Cheu, Zhonglin Chiam, Dipti Srinivasan

TL;DR
This paper introduces DeepDIVE, a variational autoencoder that disentangles features in the latent space for multi-task learning, improving forecast accuracy by balancing reconstruction and forecasting objectives.
Contribution
The paper proposes a novel disentanglement approach in VAEs using probability theory, cross-attention, and a new loss function with theoretical convergence guarantees.
Findings
DeepDIVE effectively disentangles input features in the latent space.
It achieves better forecast accuracy than standard VAEs.
It performs comparably to state-of-the-art baselines.
Abstract
Conflicting objectives present a considerable challenge in interleaving multi-task learning, necessitating the need for meticulous design and balance to ensure effective learning of a representative latent data space across all tasks without mutual negative impact. Drawing inspiration from the concept of marginal and conditional probability distributions in probability theory, we design a principled and well-founded approach to disentangle the original input into marginal and conditional probability distributions in the latent space of a variational autoencoder. Our proposed model, Deep Disentangled Interleaving Variational Encoding (DeepDIVE) learns disentangled features from the original input to form clusters in the embedding space and unifies these features via the cross-attention mechanism in the fusion stage. We theoretically prove that combining the objectives for reconstruction…
Peer Reviews
Decision·Submitted to ICLR 2025
(1) The detailed theoretical derivations of the proposed ELBO for time-series forecasting are solid and make sense, and it could be promising and inspiring for future research on learning disentangled representations in time-series prediction tasks. (2) I appreciate the authors' efforts to report the variance (standard deviation) of the model's performance, which allows readers to better evaluate the results and comparisons.
The main weakness of this paper lies in its writing and evaluations. (1) The writing lacks consistency between the text, figures, and equations. For example: (1a) While the inference and KL divergence for q(b∣x) is well-discussed, I had difficulties to find any description for inferring q(a∣x,b). In Fig. 1, it appears that a is directly computed from x, which contradicts the equation, and I couldn't find any other text to explain this. (1b) Although the "cross-attention mechanism" and "fusion
- The loss function seems to be novel and theoretically grounded.
- Presentation: The paper is not well-written, it is difficult to follow and understand the idea. - The abstract is not specific. Multiple concepts such as multi-task learning, disentanglement, Naive Bayes, and other technical details are presented without a clear, coherent relationship among them. I suggest focusing on a central contribution, such as developing a non-conflicting objective for multi-task learning, and clarifying how elements like disentanglement, Naive Bayes, cross-entropy,
1. The paper is clearly written, easy to follow and understand. 2. Experiments compared with other baselines showcase that the proposed Deep-DIVE framework achieves better performance than existing baselines. 3. In terms of novelty, the DeepDIVE framework proposed in this work decomposes the latent space z into two distinct dimensions: marginal dimensions b and conditional dimensions a. The marginal dimensions b capture general trends and are independent of each other, while the conditional dime
1. In the introduction section, the author motivates the proposed Deep-DIVE framework by criticizing existing deep learning approaches for time series forecasting as being black-box in nature and hard to optimize. However, time series forecasting (TSF) is a well-established problem. The author should provide further explanation to better justify why the proposed framework is helpful in TSF. 2. The assumption 2 that $q_{\phi}(b_i,b_j|x) = q_{\phi}(b_i|x)q_{\phi}(b_j|x)$ for any i and j is too st
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Cellular Automata and Applications
