Human Insights Driven Latent Space for Different Driving Perspectives: A Unified Encoder for Efficient Multi-Task Inference

Huy-Dung Nguyen; Anass Bairouk; Mirjana Maras; Wei Xiao; Tsun-Hsuan Wang; Patrick Chareyre; Ramin Hasani; Marc Blanchon; Daniela Rus

arXiv:2409.10095·cs.CV·April 3, 2026

Human Insights Driven Latent Space for Different Driving Perspectives: A Unified Encoder for Efficient Multi-Task Inference

Huy-Dung Nguyen, Anass Bairouk, Mirjana Maras, Wei Xiao, Tsun-Hsuan Wang, Patrick Chareyre, Ramin Hasani, Marc Blanchon, Daniela Rus

PDF

1 Repo

TL;DR

This paper introduces a unified encoder trained on multiple vision tasks to improve autonomous driving perception, demonstrating strong generalization and superior steering estimation performance.

Contribution

The work presents a multi-task trained encoder that captures rich visual features, outperforming generic pretraining methods in autonomous driving scenarios.

Findings

01

Unified encoder achieves competitive performance across perception tasks.

02

Frozen encoder with dense latent features outperforms fine-tuned models.

03

Multi-task learning enhances robustness and generalization in driving perception.

Abstract

Autonomous driving systems require a comprehensive understanding of the environment, achieved by extracting visual features essential for perception, planning, and control. However, models trained solely on single-task objectives or generic datasets often lack the contextual information needed for robust performance in complex driving scenarios. In this work, we propose a unified encoder trained on multiple computer vision tasks crucial for urban driving, including depth, pose, and 3D scene flow estimation, as well as semantic, instance, panoptic, and motion segmentation. By integrating these diverse visual cues-similar to human perceptual mechanisms-the encoder captures rich features that enhance navigation-related predictions. We evaluate the model on steering estimation as a downstream task, leveraging its dense latent space. To ensure efficient multi-task learning, we introduce a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://hi-computervision.github.io/uni-encoder
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.