SnakeVoxFormer: Transformer-based Single Image\\Voxel Reconstruction   with Run Length Encoding

Jae Joong Lee; Bedrich Benes

arXiv:2303.16293·cs.CV·March 30, 2023·1 cites

SnakeVoxFormer: Transformer-based Single Image\\Voxel Reconstruction with Run Length Encoding

Jae Joong Lee, Bedrich Benes

PDF

Open Access

TL;DR

SnakeVoxFormer introduces a transformer-based method for 3D voxel reconstruction from a single image, utilizing run-length encoding to efficiently encode spatial differences and improve accuracy over previous methods.

Contribution

The paper presents a novel voxel encoding strategy using run-length encoding with transformer models, significantly reducing data size and enhancing reconstruction performance.

Findings

01

Improves state-of-the-art accuracy by up to 19.8%.

02

Uses 1% of original data size for encoding.

03

Demonstrates effectiveness of voxel traversal strategies.

Abstract

Deep learning-based 3D object reconstruction has achieved unprecedented results. Among those, the transformer deep neural model showed outstanding performance in many applications of computer vision. We introduce SnakeVoxFormer, a novel, 3D object reconstruction in voxel space from a single image using the transformer. The input to SnakeVoxFormer is a 2D image, and the result is a 3D voxel model. The key novelty of our approach is in using the run-length encoding that traverses (like a snake) the voxel space and encodes wide spatial differences into a 1D structure that is suitable for transformer encoding. We then use dictionary encoding to convert the discovered RLE blocks into tokens that are used for the transformer. The 1D representation is a lossless 3D shape data compression method that converts to 1D data that use only about 1% of the original data size. We show how different…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Advanced Image and Video Retrieval Techniques · Medical Image Segmentation Techniques