Swin-X2S: Reconstructing 3D Shape from 2D Biplanar X-ray with Swin Transformers
Kuan Liu, Zongyuan Ying, Jie Jin, Dongyan Li, Ping Huang, Wenjian Wu,, Zhe Chen, Jin Qi, Yong Lu, Lianfu Deng, and Bo Chen

TL;DR
Swin-X2S is an end-to-end deep learning approach that uses Swin Transformers and cross-attention to accurately reconstruct 3D anatomical shapes from 2D biplanar X-ray images, improving clinical diagnostic processes.
Contribution
The paper introduces Swin-X2S, a novel encoder-decoder architecture with a dimension-expanding module for direct 3D reconstruction from 2D X-rays, outperforming previous methods.
Findings
Significant improvements in segmentation and labeling metrics.
Effective across multiple anatomies and datasets.
Clinically relevant parameter accuracy enhanced.
Abstract
The conversion from 2D X-ray to 3D shape holds significant potential for improving diagnostic efficiency and safety. However, existing reconstruction methods often rely on hand-crafted features, manual intervention, and prior knowledge, resulting in unstable shape errors and additional processing costs. In this paper, we introduce Swin-X2S, an end-to-end deep learning method for directly reconstructing 3D segmentation and labeling from 2D biplanar orthogonal X-ray images. Swin-X2S employs an encoder-decoder architecture: the encoder leverages 2D Swin Transformer for X-ray information extraction, while the decoder employs 3D convolution with cross-attention to integrate structural features from orthogonal views. A dimension-expanding module is introduced to bridge the encoder and decoder, ensuring a smooth conversion from 2D pixels to 3D voxels. We evaluate proposed method through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Computer Graphics and Visualization Techniques · Image Processing and 3D Reconstruction
MethodsAbsolute Position Encodings · Adam · Residual Connection · Dropout · Softmax · Byte Pair Encoding · Linear Layer · Attention Is All You Need · Multi-Head Attention · Position-Wise Feed-Forward Layer
