MVIP -- A Dataset and Methods for Application Oriented Multi-View and   Multi-Modal Industrial Part Recognition

Paul Koch; Marian Schl\"uter; J\"org Kr\"uger

arXiv:2502.15448·cs.CV·February 24, 2025

MVIP -- A Dataset and Methods for Application Oriented Multi-View and Multi-Modal Industrial Part Recognition

Paul Koch, Marian Schl\"uter, J\"org Kr\"uger

PDF

TL;DR

MVIP introduces a comprehensive multi-modal, multi-view dataset tailored for industrial part recognition, aiming to advance transferability, modality fusion, and synthetic data generation methods for practical industrial applications.

Contribution

The paper presents MVIP, a novel dataset combining calibrated RGBD multi-view data with object context, and establishes a benchmark for application-oriented industrial part recognition challenges.

Findings

01

MVIP enables evaluation of multi-modal recognition methods.

02

The dataset highlights challenges like limited data and similar-looking parts.

03

It facilitates research on modality fusion and synthetic data in industrial contexts.

Abstract

We present MVIP, a novel dataset for multi-modal and multi-view application-oriented industrial part recognition. Here we are the first to combine a calibrated RGBD multi-view dataset with additional object context such as physical properties, natural language, and super-classes. The current portfolio of available datasets offers a wide range of representations to design and benchmark related methods. In contrast to existing classification challenges, industrial recognition applications offer controlled multi-modal environments but at the same time have different problems than traditional 2D/3D classification challenges. Frequently, industrial applications must deal with a small amount or increased number of training data, visually similar parts, and varying object sizes, while requiring a robust near 100% top 5 accuracy under cost and time constraints. Current methods tackle such…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.