OpenAnnotate3D: Open-Vocabulary Auto-Labeling System for Multi-modal 3D   Data

Yijie Zhou; Likun Cai; Xianhui Cheng; Zhongxue Gan; Xiangyang Xue; and; Wenchao Ding

arXiv:2310.13398·cs.CV·October 23, 2023·1 cites

OpenAnnotate3D: Open-Vocabulary Auto-Labeling System for Multi-modal 3D Data

Yijie Zhou, Likun Cai, Xianhui Cheng, Zhongxue Gan, Xiangyang Xue, and, Wenchao Ding

PDF

Open Access 1 Repo

TL;DR

OpenAnnotate3D is an open-source system that automatically generates multi-modal 2D and 3D annotations using large language and vision-language models, significantly improving annotation efficiency for 3D data in real-world applications.

Contribution

It introduces one of the first open-vocabulary auto-labeling systems for multi-modal 3D data, integrating LLMs and VLMs for comprehensive annotation capabilities.

Findings

01

Significantly improves annotation efficiency over manual methods

02

Provides accurate open-vocabulary annotations for multi-modal 3D data

03

Demonstrates effectiveness on real-world datasets

Abstract

In the era of big data and large models, automatic annotating functions for multi-modal data are of great significance for real-world AI-driven applications, such as autonomous driving and embodied AI. Unlike traditional closed-set annotation, open-vocabulary annotation is essential to achieve human-level cognition capability. However, there are few open-vocabulary auto-labeling systems for multi-modal 3D data. In this paper, we introduce OpenAnnotate3D, an open-source open-vocabulary auto-labeling system that can automatically generate 2D masks, 3D masks, and 3D bounding box annotations for vision and point cloud data. Our system integrates the chain-of-thought capabilities of Large Language Models (LLMs) and the cross-modality capabilities of vision-language models (VLMs). To the best of our knowledge, OpenAnnotate3D is one of the pioneering works for open-vocabulary multi-modal 3D…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fudan-projecttitan/openannotate3d
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques