FLUID: Continuous-Time Hyperconnected Sparse Transformer for Sink-Free Learning

Waleed Razzaq; Yun-Bo Zhao

arXiv:2605.04421·cs.LG·May 7, 2026

FLUID: Continuous-Time Hyperconnected Sparse Transformer for Sink-Free Learning

Waleed Razzaq, Yun-Bo Zhao

PDF

TL;DR

FLUID introduces a continuous-time Transformer with Liquid Attention Network, integrating continuous dynamics into attention computation for improved modeling of irregular data, long-range dependencies, and physical dynamics.

Contribution

It proposes FLUID, a novel CT Transformer that embeds continuous dynamics into attention, with stability guarantees and superior empirical performance across diverse tasks.

Findings

01

FLUID outperforms CT baselines by up to 47% in certain tasks.

02

It demonstrates robustness to noise and better generalization under distributional shifts.

03

FLUID achieves a balance between runtime and memory efficiency among competing models.

Abstract

Continuous-time (CT) Transformers improve irregular and long-range modeling over CT-RNNs by exploiting inputs or outputs embeddings with continuous dynamics. However, the core scaled-dot-product-attention (SDPA) mechanism remains inherently discrete. We propose FLUID (Flexible Unified Information Dynamics), a CT Transformer that incorporates continuous dynamics directly into the attention computation by replacing it with Liquid Attention Network (LAN). LAN reinterprets attention logits as continuous dynamical system and reformulates them as the solution to a linear ODE modulated by input-dependent nonlinear recurrent gates. Theoretically, we establish stability guarantees for LAN dynamics and show that it serves as an interpolating middle ground between SDPA and CT-RNNs, recovering each as special case under well-defined parameterization of its gating functions. LAN also introduces an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.