Dual Filter: A Transformer-like Inference Architecture for Hidden Markov Models
Heng-Sheng Chang, Prashant G. Mehta

TL;DR
This paper introduces a novel dual filter algorithm inspired by transformer architecture for causal nonlinear prediction in hidden Markov models, combining optimal control theory with iterative filtering techniques.
Contribution
It develops a transformer-like inference architecture for HMMs based on an optimal control framework, providing a new mathematical foundation and algorithmic approach.
Findings
The dual filter algorithm closely parallels transformer architecture.
Numerical experiments demonstrate effective prediction performance.
The framework offers a new perspective on transformer modeling as transport on probability measures.
Abstract
This paper presents a mathematical framework for causal nonlinear prediction in settings where observations are generated from an underlying hidden Markov model (HMM). Both the problem formulation and the proposed solution are motivated by the decoder-only transformer architecture, in which a finite sequence of observations (tokens) is mapped to the conditional probability of the next token. Our objective is not to construct a mathematical model of a transformer. Rather, our interest lies in deriving, from first principles, transformer-like architectures that solve the prediction problem for which the transformer is designed. The proposed framework is based on an original optimal control approach, where the prediction objective (MMSE) is reformulated as an optimal control problem. An analysis of the optimal control problem is presented leading to a fixed-point equation on the space of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
