SnakeVoxFormer: Transformer-based Single Image\\Voxel Reconstruction with Run Length Encoding
Jae Joong Lee, Bedrich Benes

TL;DR
SnakeVoxFormer introduces a transformer-based method for 3D voxel reconstruction from a single image, utilizing run-length encoding to efficiently encode spatial differences and improve accuracy over previous methods.
Contribution
The paper presents a novel voxel encoding strategy using run-length encoding with transformer models, significantly reducing data size and enhancing reconstruction performance.
Findings
Improves state-of-the-art accuracy by up to 19.8%.
Uses 1% of original data size for encoding.
Demonstrates effectiveness of voxel traversal strategies.
Abstract
Deep learning-based 3D object reconstruction has achieved unprecedented results. Among those, the transformer deep neural model showed outstanding performance in many applications of computer vision. We introduce SnakeVoxFormer, a novel, 3D object reconstruction in voxel space from a single image using the transformer. The input to SnakeVoxFormer is a 2D image, and the result is a 3D voxel model. The key novelty of our approach is in using the run-length encoding that traverses (like a snake) the voxel space and encodes wide spatial differences into a 1D structure that is suitable for transformer encoding. We then use dictionary encoding to convert the discovered RLE blocks into tokens that are used for the transformer. The 1D representation is a lossless 3D shape data compression method that converts to 1D data that use only about 1% of the original data size. We show how different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Image and Video Retrieval Techniques · Medical Image Segmentation Techniques
