FEAT: Full-Dimensional Efficient Attention Transformer for Medical Video Generation

Huihan Wang; Zhiwen Yang; Hui Zhang; Dan Zhao; Bingzheng Wei; Yan Xu

arXiv:2506.04956·cs.CV·June 6, 2025

FEAT: Full-Dimensional Efficient Attention Transformer for Medical Video Generation

Huihan Wang, Zhiwen Yang, Hui Zhang, Dan Zhao, Bingzheng Wei, Yan Xu

PDF

Open Access 1 Repo 1 Models

TL;DR

FEAT introduces a novel full-dimensional efficient attention Transformer for medical video generation, effectively capturing spatial, temporal, and channel dependencies with reduced computational complexity and improved noise adaptation.

Contribution

The paper presents FEAT, a new Transformer architecture with unified attention across all dimensions, linear complexity attention mechanisms, and a residual guidance module for medical video synthesis.

Findings

01

FEAT-S uses only 23% of parameters of state-of-the-art models but achieves comparable or better performance.

02

FEAT-L outperforms all comparison methods across multiple datasets.

03

FEAT demonstrates superior scalability and effectiveness in medical video generation.

Abstract

Synthesizing high-quality dynamic medical videos remains a significant challenge due to the need for modeling both spatial consistency and temporal dynamics. Existing Transformer-based approaches face critical limitations, including insufficient channel interactions, high computational complexity from self-attention, and coarse denoising guidance from timestep embeddings when handling varying noise levels. In this work, we propose FEAT, a full-dimensional efficient attention Transformer, which addresses these issues through three key innovations: (1) a unified paradigm with sequential spatial-temporal-channel attention mechanisms to capture global dependencies across all dimensions, (2) a linear-complexity design for attention mechanisms in each dimension, utilizing weighted key-value attention and global channel attention, and (3) a residual value guidance module that provides…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yaziwel/feat
pytorchOfficial

Models

🤗
WTHH031230/FEAT
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Human Pose and Action Recognition

MethodsLayer Normalization · Dropout · Absolute Position Encodings · Dense Connections · Byte Pair Encoding · Softmax · Label Smoothing · Transformer