CL3DOR: Contrastive Learning for 3D Large Multimodal Models via Odds   Ratio on High-Resolution Point Clouds

Keonwoo Kim; Yeongjae Cho; Taebaek Hwang; Minsoo Jo; Sangdo Han

arXiv:2501.03879·cs.CV·January 8, 2025

CL3DOR: Contrastive Learning for 3D Large Multimodal Models via Odds Ratio on High-Resolution Point Clouds

Keonwoo Kim, Yeongjae Cho, Taebaek Hwang, Minsoo Jo, Sangdo Han

PDF

Open Access

TL;DR

CL3DOR introduces a contrastive learning approach using odds ratio on high-resolution point clouds to improve the specificity and clarity of 3D multimodal models, leading to state-of-the-art results.

Contribution

The paper presents a novel contrastive learning method that enhances 3D multimodal models by increasing point cloud density and utilizing odds ratio for better cross-modal understanding.

Findings

01

Achieves state-of-the-art performance on 3D scene understanding benchmarks.

02

Effectively leverages hard negative responses for improved model training.

03

Demonstrates the importance of high-resolution point clouds for 3D multimodal tasks.

Abstract

Recent research has demonstrated that Large Language Models (LLMs) are not limited to text-only tasks but can also function as multimodal models across various modalities, including audio, images, and videos. In particular, research on 3D Large Multimodal Models (3D LMMs) is making notable strides, driven by the potential of processing higher-dimensional data like point clouds. However, upon closer examination, we find that the visual and textual content within each sample of existing training datasets lacks both high informational granularity and clarity, which serve as a bottleneck for precise cross-modal understanding. To address these issues, we propose CL3DOR, Contrastive Learning for 3D large multimodal models via Odds ratio on high-Resolution point clouds, designed to ensure greater specificity and clarity in both visual and textual content. Specifically, we increase the density…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Image Processing and 3D Reconstruction · Human Pose and Action Recognition

MethodsContrastive Learning