Beyond ZOH: Advanced Discretization Strategies for Vision Mamba

Fady Ibrahim; Guangjun Liu; Guanghui Wang

arXiv:2604.20606·cs.CV·April 23, 2026

Beyond ZOH: Advanced Discretization Strategies for Vision Mamba

Fady Ibrahim, Guangjun Liu, Guanghui Wang

PDF

TL;DR

This paper systematically compares six discretization schemes within Vision Mamba, revealing that bilinear transform offers the best balance of accuracy and efficiency for SSM-based vision models.

Contribution

It provides an empirical evaluation of multiple discretization methods in Vision Mamba, recommending BIL as the default for improved performance.

Findings

01

POL and HOH improve accuracy but increase training time.

02

BIL offers consistent improvements with modest overhead.

03

Discretization choice significantly impacts SSM-based vision model performance.

Abstract

Vision Mamba, as a state space model (SSM), employs a zero-order hold (ZOH) discretization, which assumes that input signals remain constant between sampling instants. This assumption degrades temporal fidelity in dynamic visual environments and constrains the attainable accuracy of modern SSM-based vision models. In this paper, we present a systematic and controlled comparison of six discretization schemes instantiated within the Vision Mamba framework: ZOH, first-order hold (FOH), bilinear/Tustin transform (BIL), polynomial interpolation (POL), higher-order hold (HOH), and the fourth-order Runge-Kutta method (RK4). We evaluate each method on standard visual benchmarks to quantify its influence in image classification, semantic segmentation, and object detection. Our results demonstrate that POL and HOH yield the largest gains in accuracy at the cost of higher training-time…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.