HENet++: Hybrid Encoding and Multi-task Learning for 3D Perception and End-to-end Autonomous Driving

Zhongyu Xia; Zhiwei Lin; Yongtao Wang; Ming-Hsuan Yang

arXiv:2511.07106·cs.CV·November 11, 2025

HENet++: Hybrid Encoding and Multi-task Learning for 3D Perception and End-to-end Autonomous Driving

Zhongyu Xia, Zhiwei Lin, Yongtao Wang, Ming-Hsuan Yang

PDF

Open Access

TL;DR

HENet++ introduces a hybrid encoding and multi-task learning framework that enhances 3D perception and autonomous driving by efficiently combining features from different temporal scales and representations, achieving state-of-the-art results.

Contribution

The paper proposes a novel hybrid image encoding network and multi-task learning framework that improves 3D perception accuracy and efficiency in autonomous driving systems.

Findings

01

Achieves state-of-the-art results on nuScenes 3D perception benchmark.

02

Attains the lowest collision rate on nuScenes autonomous driving benchmark.

03

Supports multimodal inputs and is compatible with existing 3D feature extraction methods.

Abstract

Three-dimensional feature extraction is a critical component of autonomous driving systems, where perception tasks such as 3D object detection, bird's-eye-view (BEV) semantic segmentation, and occupancy prediction serve as important constraints on 3D features. While large image encoders, high-resolution images, and long-term temporal inputs can significantly enhance feature quality and deliver remarkable performance gains, these techniques are often incompatible in both training and inference due to computational resource constraints. Moreover, different tasks favor distinct feature representations, making it difficult for a single model to perform end-to-end inference across multiple tasks while maintaining accuracy comparable to that of single-task models. To alleviate these issues, we present the HENet and HENet++ framework for multi-task 3D perception and end-to-end autonomous…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Autonomous Vehicle Technology and Safety · Visual Attention and Saliency Detection