Swin-X2S: Reconstructing 3D Shape from 2D Biplanar X-ray with Swin   Transformers

Kuan Liu; Zongyuan Ying; Jie Jin; Dongyan Li; Ping Huang; Wenjian Wu,; Zhe Chen; Jin Qi; Yong Lu; Lianfu Deng; and Bo Chen

arXiv:2501.05961·cs.CV·January 13, 2025

Swin-X2S: Reconstructing 3D Shape from 2D Biplanar X-ray with Swin Transformers

Kuan Liu, Zongyuan Ying, Jie Jin, Dongyan Li, Ping Huang, Wenjian Wu,, Zhe Chen, Jin Qi, Yong Lu, Lianfu Deng, and Bo Chen

PDF

Open Access

TL;DR

Swin-X2S is an end-to-end deep learning approach that uses Swin Transformers and cross-attention to accurately reconstruct 3D anatomical shapes from 2D biplanar X-ray images, improving clinical diagnostic processes.

Contribution

The paper introduces Swin-X2S, a novel encoder-decoder architecture with a dimension-expanding module for direct 3D reconstruction from 2D X-rays, outperforming previous methods.

Findings

01

Significant improvements in segmentation and labeling metrics.

02

Effective across multiple anatomies and datasets.

03

Clinically relevant parameter accuracy enhanced.

Abstract

The conversion from 2D X-ray to 3D shape holds significant potential for improving diagnostic efficiency and safety. However, existing reconstruction methods often rely on hand-crafted features, manual intervention, and prior knowledge, resulting in unstable shape errors and additional processing costs. In this paper, we introduce Swin-X2S, an end-to-end deep learning method for directly reconstructing 3D segmentation and labeling from 2D biplanar orthogonal X-ray images. Swin-X2S employs an encoder-decoder architecture: the encoder leverages 2D Swin Transformer for X-ray information extraction, while the decoder employs 3D convolution with cross-attention to integrate structural features from orthogonal views. A dimension-expanding module is introduced to bridge the encoder and decoder, ensuring a smooth conversion from 2D pixels to 3D voxels. We evaluate proposed method through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Computer Graphics and Visualization Techniques · Image Processing and 3D Reconstruction

MethodsAbsolute Position Encodings · Adam · Residual Connection · Dropout · Softmax · Byte Pair Encoding · Linear Layer · Attention Is All You Need · Multi-Head Attention · Position-Wise Feed-Forward Layer