StixelNExT++: Lightweight Monocular Scene Segmentation and Representation for Collective Perception
Marcel Vosshans, Omar Ait-Aider, Youcef Mezouar, Markus Enzweiler

TL;DR
StixelNExT++ introduces a lightweight, real-time monocular scene segmentation method that efficiently infers 3D Stixels, improving object segmentation and scene representation for autonomous perception systems.
Contribution
It extends the Stixel representation to monocular perception, enabling real-time 3D scene inference with high compression and adaptability, trained on LiDAR ground truth.
Findings
Achieves real-time processing at 10 ms per frame.
Demonstrates competitive performance on Waymo dataset within 30 meters.
Provides a scalable, lightweight scene representation for autonomous perception.
Abstract
This paper presents StixelNExT++, a novel approach to scene representation for monocular perception systems. Building on the established Stixel representation, our method infers 3D Stixels and enhances object segmentation by clustering smaller 3D Stixel units. The approach achieves high compression of scene information while remaining adaptable to point cloud and bird's-eye-view representations. Our lightweight neural network, trained on automatically generated LiDAR-based ground truth, achieves real-time performance with computation times as low as 10 ms per frame. Experimental results on the Waymo dataset demonstrate competitive performance within a 30-meter range, highlighting the potential of StixelNExT++ for collective perception in autonomous systems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · Advanced Vision and Imaging
