FEAT: Full-Dimensional Efficient Attention Transformer for Medical Video Generation
Huihan Wang, Zhiwen Yang, Hui Zhang, Dan Zhao, Bingzheng Wei, Yan Xu

TL;DR
FEAT introduces a novel full-dimensional efficient attention Transformer for medical video generation, effectively capturing spatial, temporal, and channel dependencies with reduced computational complexity and improved noise adaptation.
Contribution
The paper presents FEAT, a new Transformer architecture with unified attention across all dimensions, linear complexity attention mechanisms, and a residual guidance module for medical video synthesis.
Findings
FEAT-S uses only 23% of parameters of state-of-the-art models but achieves comparable or better performance.
FEAT-L outperforms all comparison methods across multiple datasets.
FEAT demonstrates superior scalability and effectiveness in medical video generation.
Abstract
Synthesizing high-quality dynamic medical videos remains a significant challenge due to the need for modeling both spatial consistency and temporal dynamics. Existing Transformer-based approaches face critical limitations, including insufficient channel interactions, high computational complexity from self-attention, and coarse denoising guidance from timestep embeddings when handling varying noise levels. In this work, we propose FEAT, a full-dimensional efficient attention Transformer, which addresses these issues through three key innovations: (1) a unified paradigm with sequential spatial-temporal-channel attention mechanisms to capture global dependencies across all dimensions, (2) a linear-complexity design for attention mechanisms in each dimension, utilizing weighted key-value attention and global channel attention, and (3) a residual value guidance module that provides…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Human Pose and Action Recognition
MethodsLayer Normalization · Dropout · Absolute Position Encodings · Dense Connections · Byte Pair Encoding · Softmax · Label Smoothing · Transformer
