Configurable Embodied Data Generation for Class-Agnostic RGB-D Video Segmentation
Anthony Opipari, Aravindhan K Krishnan, Shreekant Gayaka, Min Sun,, Cheng-Hao Kuo, Arnie Sen, Odest Chadwicke Jenkins

TL;DR
This paper introduces a configurable data generation pipeline and a large RGB-D video dataset to enhance class-agnostic video segmentation across different robot embodiments, improving transferability and accuracy.
Contribution
We propose a novel pipeline for generating robot embodiment-specific RGB-D video data and introduce the MVPd dataset for benchmarking and research.
Findings
Finetuning on MVPd improves model transfer to specific robot configurations.
Using 3D modalities enhances segmentation accuracy and consistency.
The dataset supports embodiment-focused research in video segmentation.
Abstract
This paper presents a method for generating large-scale datasets to improve class-agnostic video segmentation across robots with different form factors. Specifically, we consider the question of whether video segmentation models trained on generic segmentation data could be more effective for particular robot platforms if robot embodiment is factored into the data generation process. To answer this question, a pipeline is formulated for using 3D reconstructions (e.g. from HM3DSem) to generate segmented videos that are configurable based on a robot's embodiment (e.g. sensor type, sensor placement, and illumination source). A resulting massive RGB-D video panoptic segmentation dataset (MVPd) is introduced for extensive benchmarking with foundation and video segmentation models, as well as to support embodiment-focused research in video segmentation. Our experimental findings demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Image Segmentation Techniques · Generative Adversarial Networks and Image Synthesis · Industrial Vision Systems and Defect Detection
