Semantic Segmentation and Scene Reconstruction of RGB-D Image Frames: An End-to-End Modular Pipeline for Robotic Applications
Zhiwu Zheng, Lauren Mentzer, Berk Iskender, Michael Price, Colm, Prendergast, Audren Cloitre

TL;DR
This paper presents an end-to-end modular pipeline for RGB-D scene understanding that combines advanced semantic segmentation, human tracking, and efficient scene reconstruction to enhance robotic perception and interaction.
Contribution
It introduces a novel integrated pipeline that improves semantic segmentation accuracy, object boundary precision, and computational efficiency for robotic scene understanding.
Findings
Semantic segmentation accuracy comparable to state-of-the-art methods
Enhanced human tracking with re-identification capabilities
Reduced computation time for point cloud fusion by 1.81x
Abstract
Robots operating in unstructured environments require a comprehensive understanding of their surroundings, necessitating geometric and semantic information from sensor data. Traditional RGB-D processing pipelines focus primarily on geometric reconstruction, limiting their ability to support advanced robotic perception, planning, and interaction. A key challenge is the lack of generalized methods for segmenting RGB-D data into semantically meaningful components while maintaining accurate geometric representations. We introduce a novel end-to-end modular pipeline that integrates state-of-the-art semantic segmentation, human tracking, point-cloud fusion, and scene reconstruction. Our approach improves semantic segmentation accuracy by leveraging the foundational segmentation model SAM2 with a hybrid method that combines its mask generation with a semantic classification model, resulting in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection · Autonomous Vehicle Technology and Safety
