A Hybrid Autoencoder for Robust Heightmap Generation from Fused Lidar and Depth Data for Humanoid Robot Locomotion
Dennis Bank, Joost Cordes, Thomas Seel, and Simon F.G. Ehlers

TL;DR
This paper introduces a hybrid neural network framework that fuses multimodal sensor data to generate accurate, temporally consistent heightmaps for humanoid robot terrain perception, enhancing robustness in unstructured environments.
Contribution
It proposes a novel hybrid Encoder-Decoder architecture combining CNN and GRU for multimodal sensor fusion and temporal consistency in heightmap generation.
Findings
Multimodal fusion improves reconstruction accuracy by 7.2%.
Temporal integration reduces mapping drift by 3.2 seconds.
The approach outperforms single-sensor configurations in accuracy.
Abstract
Reliable terrain perception is a critical prerequisite for the deployment of humanoid robots in unstructured, human-centric environments. While traditional systems often rely on manually engineered, single-sensor pipelines, this paper presents a learning-based framework that uses an intermediate, robot-centric heightmap representation. A hybrid Encoder-Decoder Structure (EDS) is introduced, utilizing a Convolutional Neural Network (CNN) for spatial feature extraction fused with a Gated Recurrent Unit (GRU) core for temporal consistency. The architecture integrates multimodal data from an Intel RealSense depth camera, a LIVOX MID-360 LiDAR processed via efficient spherical projection, and an onboard IMU. Quantitative results demonstrate that multimodal fusion improves reconstruction accuracy by 7.2% over depth-only and 9.9% over LiDAR-only configurations. Furthermore, the integration of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Optical Sensing Technologies · Advanced Vision and Imaging
