MVIP -- A Dataset and Methods for Application Oriented Multi-View and Multi-Modal Industrial Part Recognition
Paul Koch, Marian Schl\"uter, J\"org Kr\"uger

TL;DR
MVIP introduces a comprehensive multi-modal, multi-view dataset tailored for industrial part recognition, aiming to advance transferability, modality fusion, and synthetic data generation methods for practical industrial applications.
Contribution
The paper presents MVIP, a novel dataset combining calibrated RGBD multi-view data with object context, and establishes a benchmark for application-oriented industrial part recognition challenges.
Findings
MVIP enables evaluation of multi-modal recognition methods.
The dataset highlights challenges like limited data and similar-looking parts.
It facilitates research on modality fusion and synthetic data in industrial contexts.
Abstract
We present MVIP, a novel dataset for multi-modal and multi-view application-oriented industrial part recognition. Here we are the first to combine a calibrated RGBD multi-view dataset with additional object context such as physical properties, natural language, and super-classes. The current portfolio of available datasets offers a wide range of representations to design and benchmark related methods. In contrast to existing classification challenges, industrial recognition applications offer controlled multi-modal environments but at the same time have different problems than traditional 2D/3D classification challenges. Frequently, industrial applications must deal with a small amount or increased number of training data, visually similar parts, and varying object sizes, while requiring a robust near 100% top 5 accuracy under cost and time constraints. Current methods tackle such…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
