Fine-grained Context and Multi-modal Alignment for Freehand 3D Ultrasound Reconstruction
Zhongnuo Yan, Xin Yang, Mingyuan Luo, Jiongquan Chen, Rusi Chen, Lian, Liu, Dong Ni

TL;DR
This paper introduces ReMamba, a novel approach leveraging multi-directional state space models and multi-modal data to enhance fine-grained 3D ultrasound reconstruction, significantly outperforming existing methods.
Contribution
The paper presents a multi-scale spatio-temporal learning framework with adaptive fusion and online alignment strategies for improved freehand 3D ultrasound reconstruction.
Findings
Significant performance improvements over existing methods.
Effective multi-modal fusion with inertial measurement units.
Enhanced long-range dependency modeling with state space models.
Abstract
Fine-grained spatio-temporal learning is crucial for freehand 3D ultrasound reconstruction. Previous works mainly resorted to the coarse-grained spatial features and the separated temporal dependency learning and struggles for fine-grained spatio-temporal learning. Mining spatio-temporal information in fine-grained scales is extremely challenging due to learning difficulties in long-range dependencies. In this context, we propose a novel method to exploit the long-range dependency management capabilities of the state space model (SSM) to address the above challenge. Our contribution is three-fold. First, we propose ReMamba, which mines multi-scale spatio-temporal information by devising a multi-directional SSM. Second, we propose an adaptive fusion strategy that introduces multiple inertial measurement units as auxiliary temporal information to enhance spatio-temporal perception. Last,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis
