Formula-Supervised Visual-Geometric Pre-training

Ryosuke Yamada; Kensho Hara; Hirokatsu Kataoka; Koshi Makihara,; Nakamasa Inoue; Rio Yokota; Yutaka Satoh

arXiv:2409.13535·cs.CV·September 23, 2024

Formula-Supervised Visual-Geometric Pre-training

Ryosuke Yamada, Kensho Hara, Hirokatsu Kataoka, Koshi Makihara,, Nakamasa Inoue, Rio Yokota, Yutaka Satoh

PDF

Open Access

TL;DR

This paper introduces FSVGP, a synthetic pre-training method that aligns images and point clouds from mathematical formulas, enabling improved unified visual-geometric recognition tasks with less reliance on real data.

Contribution

The paper presents a novel synthetic pre-training approach that integrates images and point clouds on a transformer model using formula-generated data, reducing the need for real data and annotations.

Findings

01

FSVGP outperforms VisualAtom and PC-FractalDB in six recognition tasks.

02

Synthetic pre-training enhances generalization in image and 3D recognition.

03

Unified transformer model effectively integrates visual and geometric modalities.

Abstract

Throughout the history of computer vision, while research has explored the integration of images (visual) and point clouds (geometric), many advancements in image and 3D object recognition have tended to process these modalities separately. We aim to bridge this divide by integrating images and point clouds on a unified transformer model. This approach integrates the modality-specific properties of images and point clouds and achieves fundamental downstream tasks in image and 3D object recognition on a unified transformer model by learning visual-geometric representations. In this work, we introduce Formula-Supervised Visual-Geometric Pre-training (FSVGP), a novel synthetic pre-training method that automatically generates aligned synthetic images and point clouds from mathematical formulas. Through cross-modality supervision, we enable supervised pre-training between visual and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAugmented Reality Applications