RF-HiT: Rectified Flow Hierarchical Transformer for General Medical Image Segmentation
Ahmed Marouane Djouama, Abir Belaala, Abdellah Zakaria Sellam, Salah Eddine Bekhouche, Cosimo Distante, Abdenour Hadid

TL;DR
RF-HiT introduces a computationally efficient hierarchical transformer for medical image segmentation, achieving high accuracy with minimal inference steps and low computational cost.
Contribution
It presents a novel rectified flow hierarchical transformer that combines efficiency with high segmentation performance, surpassing prior diffusion-based methods.
Findings
Achieves 91.27% mean Dice on ACDC dataset.
Requires only 10.14 GFLOPs and 13.6M parameters.
Operates with as few as three inference steps.
Abstract
Accurate medical image segmentation requires both long-range contextual reasoning and precise boundary delineation, a task where existing transformer- and diffusion-based paradigms are frequently bottlenecked by quadratic computational complexity and prohibitive inference latency. We propose RF-HiT, a Rectified Flow Hierarchical Transformer that integrates an hourglass transformer backbone with a multi-scale hierarchical encoder for anatomically guided feature conditioning. Unlike prior diffusion-based approaches, RF-HiT leverages rectified flow with efficient transformer blocks to achieve linear complexity while requiring only a few discretization steps. The model further fuses conditioning features across resolutions via learnable interpolation, enabling effective multi-scale representation with minimal computational overhead. As a result, RF-HiT achieves a strong…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
