3D Semantic Scene Perception using Distributed Smart Edge Sensors
Simon Bultmann, Sven Behnke

TL;DR
This paper introduces a real-time 3D semantic scene perception system using distributed smart edge sensors with embedded CNNs, which efficiently fuse multi-view data into a comprehensive 3D model while preserving privacy.
Contribution
The novel system integrates on-device CNN inference on edge sensors with real-time 3D scene reconstruction, reducing bandwidth and privacy risks compared to traditional methods.
Findings
Provides semantically annotated 3D geometry in real time
Accurately estimates 3D poses of multiple persons
Reduces bandwidth and privacy risks
Abstract
We present a system for 3D semantic scene perception consisting of a network of distributed smart edge sensors. The sensor nodes are based on an embedded CNN inference accelerator and RGB-D and thermal cameras. Efficient vision CNN models for object detection, semantic segmentation, and human pose estimation run on-device in real time. 2D human keypoint estimations, augmented with the RGB-D depth estimate, as well as semantically annotated point clouds are streamed from the sensors to a central backend, where multiple viewpoints are fused into an allocentric 3D semantic scene model. As the image interpretation is computed locally, only semantic information is sent over the network. The raw images remain on the sensor boards, significantly reducing the required bandwidth, and mitigating privacy risks for the observed persons. We evaluate the proposed system in challenging real-world…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · 3D Surveying and Cultural Heritage · Human Pose and Action Recognition
