ObitoNet: Multimodal High-Resolution Point Cloud Reconstruction

Apoorv Thapliyal; Vinay Lanka; Swathi Baskaran

arXiv:2412.18775·cs.CV·December 30, 2024

ObitoNet: Multimodal High-Resolution Point Cloud Reconstruction

Apoorv Thapliyal, Vinay Lanka, Swathi Baskaran

PDF

Open Access 1 Repo

TL;DR

ObitoNet introduces a multimodal transformer-based framework that combines image semantics and geometric details to achieve high-resolution point cloud reconstruction, improving robustness in sparse or noisy data scenarios.

Contribution

It presents a novel integration of Vision Transformers and point cloud tokenization with a transformer decoder for enhanced 3D reconstruction.

Findings

01

Effective in reconstructing high-resolution point clouds

02

Robust performance with sparse and noisy data

03

Combines semantic and geometric features successfully

Abstract

ObitoNet employs a Cross Attention mechanism to integrate multimodal inputs, where Vision Transformers (ViT) extract semantic features from images and a point cloud tokenizer processes geometric information using Farthest Point Sampling (FPS) and K Nearest Neighbors (KNN) for spatial structure capture. The learned multimodal features are fed into a transformer-based decoder for high-resolution point cloud reconstruction. This approach leverages the complementary strengths of both modalities rich image features and precise geometric details ensuring robust point cloud generation even in challenging conditions such as sparse or noisy data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vinay-lanka/ObitoNet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRemote Sensing and LiDAR Applications · 3D Surveying and Cultural Heritage · Image Processing and 3D Reconstruction

MethodsSoftmax · Attention Is All You Need