Dual Filter: A Transformer-like Inference Architecture for Hidden Markov Models

Heng-Sheng Chang; Prashant G. Mehta

arXiv:2505.00818·cs.LG·March 16, 2026

Dual Filter: A Transformer-like Inference Architecture for Hidden Markov Models

Heng-Sheng Chang, Prashant G. Mehta

PDF

Open Access

TL;DR

This paper introduces a novel dual filter algorithm inspired by transformer architecture for causal nonlinear prediction in hidden Markov models, combining optimal control theory with iterative filtering techniques.

Contribution

It develops a transformer-like inference architecture for HMMs based on an optimal control framework, providing a new mathematical foundation and algorithmic approach.

Findings

01

The dual filter algorithm closely parallels transformer architecture.

02

Numerical experiments demonstrate effective prediction performance.

03

The framework offers a new perspective on transformer modeling as transport on probability measures.

Abstract

This paper presents a mathematical framework for causal nonlinear prediction in settings where observations are generated from an underlying hidden Markov model (HMM). Both the problem formulation and the proposed solution are motivated by the decoder-only transformer architecture, in which a finite sequence of observations (tokens) is mapped to the conditional probability of the next token. Our objective is not to construct a mathematical model of a transformer. Rather, our interest lies in deriving, from first principles, transformer-like architectures that solve the prediction problem for which the transformer is designed. The proposed framework is based on an original optimal control approach, where the prediction objective (MMSE) is reformulated as an optimal control problem. An analysis of the optimal control problem is presented leading to a fixed-point equation on the space of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications