HENet++: Hybrid Encoding and Multi-task Learning for 3D Perception and End-to-end Autonomous Driving
Zhongyu Xia, Zhiwei Lin, Yongtao Wang, Ming-Hsuan Yang

TL;DR
HENet++ introduces a hybrid encoding and multi-task learning framework that enhances 3D perception and autonomous driving by efficiently combining features from different temporal scales and representations, achieving state-of-the-art results.
Contribution
The paper proposes a novel hybrid image encoding network and multi-task learning framework that improves 3D perception accuracy and efficiency in autonomous driving systems.
Findings
Achieves state-of-the-art results on nuScenes 3D perception benchmark.
Attains the lowest collision rate on nuScenes autonomous driving benchmark.
Supports multimodal inputs and is compatible with existing 3D feature extraction methods.
Abstract
Three-dimensional feature extraction is a critical component of autonomous driving systems, where perception tasks such as 3D object detection, bird's-eye-view (BEV) semantic segmentation, and occupancy prediction serve as important constraints on 3D features. While large image encoders, high-resolution images, and long-term temporal inputs can significantly enhance feature quality and deliver remarkable performance gains, these techniques are often incompatible in both training and inference due to computational resource constraints. Moreover, different tasks favor distinct feature representations, making it difficult for a single model to perform end-to-end inference across multiple tasks while maintaining accuracy comparable to that of single-task models. To alleviate these issues, we present the HENet and HENet++ framework for multi-task 3D perception and end-to-end autonomous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Autonomous Vehicle Technology and Safety · Visual Attention and Saliency Detection
