State Space LSTM Models with Particle MCMC Inference
Xun Zheng, Manzil Zaheer, Amr Ahmed, Yuan Wang, Eric P Xing, Alexander, J Smola

TL;DR
This paper introduces State Space LSTM models that integrate interpretability of state space models with the performance of LSTMs, using a particle MCMC inference method that outperforms previous approaches.
Contribution
It proposes a novel State Space LSTM model combined with a particle MCMC inference algorithm that avoids factorization assumptions, enhancing interpretability and inference accuracy.
Findings
SMC inference outperforms previous methods in stability and accuracy
State Space LSTM models improve interpretability of sequence models
Experimental results demonstrate the effectiveness across various domains
Abstract
Long Short-Term Memory (LSTM) is one of the most powerful sequence models. Despite the strong performance, however, it lacks the nice interpretability as in state space models. In this paper, we present a way to combine the best of both worlds by introducing State Space LSTM (SSL) models that generalizes the earlier work \cite{zaheer2017latent} of combining topic models with LSTM. However, unlike \cite{zaheer2017latent}, we do not make any factorization assumptions in our inference algorithm. We present an efficient sampler based on sequential Monte Carlo (SMC) method that draws from the joint posterior directly. Experimental results confirms the superiority and stability of this SMC inference algorithm on a variety of domains.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Machine Learning in Materials Science · Neural Networks and Applications
MethodsInterpretability · Sigmoid Activation · Tanh Activation · Affine Coupling · Normalizing Flows · Long Short-Term Memory
