# HiPro-AD: Sparse Trajectory Transformer for End-to-End Autonomous Driving with Hybrid Spatiotemporal Attention

**Authors:** Bing Chen, Gaopeng Wang, Jiandong Yang, Shaoliang Huang, Xinhe Qian, Bin Huang, Guanlun Guo

PMC · DOI: 10.3390/s26010185 · Sensors (Basel, Switzerland) · 2025-12-26

## TL;DR

HiPro-AD is a new autonomous driving system that uses sparse trajectory planning and attention mechanisms to improve efficiency and performance.

## Contribution

HiPro-AD introduces a sparse, proposal-centric framework with a novel STFormer for efficient end-to-end autonomous driving.

## Key findings

- HiPro-AD achieves a PDMS of 92.6 on the NAVSIM benchmark using only camera input.
- It attains a 37.31% success rate and a driving score of 65.48 on the Bench2Drive benchmark with 67 ms latency.

## Abstract

End-to-end (E2E) autonomous driving offers a promising alternative to traditional modular pipelines by mapping raw sensor data directly to vehicle controls, thereby mitigating error propagation. However, prevalent approaches largely rely on dense Bird’s-Eye-View (BEV) feature maps, which incur high computational overhead and necessitate complex post-processing for trajectory generation. To address these limitations, we propose HiPro-AD, a proposal-centric sparse E2E planning framework that fundamentally diverges from dense BEV paradigms. HiPro-AD integrates an efficiency-oriented IM-ResNet-34 encoder with a novel STFormer. This transformer dynamically fuses multi-view spatial features and historical temporal context via a proposal-anchored mechanism, focusing computation strictly on regions relevant to sparse trajectory proposals. Furthermore, trajectory selection is refined by a Pairwise Ranking Scorer, which identifies the optimal plan from diverse candidates based on relative quality. On the NAVSIM benchmark, HiPro-AD achieves a PDMS of 92.6 using only camera input, surpassing prior dense BEV and multimodal methods. On the closed-loop Bench2Drive benchmark, it attains a 37.31% success rate and a driving score of 65.48 with a latency of 67 ms, demonstrating real-time capability. These results validate the efficiency and robustness of our sparse paradigm in complex driving scenarios.

## Full-text entities

- **Diseases:** injury to (MESH:D014947), occlusions (MESH:D001157)
- **Chemicals:** HiPro (MESH:C035699), PDMS (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Mutations:** A3C

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12787977/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12787977/full.md

## References

43 references — full list in the complete paper: https://tomesphere.com/paper/PMC12787977/full.md

---
Source: https://tomesphere.com/paper/PMC12787977