A Comprehensive Evaluation of Four End-to-End AI Autopilots Using CCTest and the Carla Leaderboard
Changwen Li, Joseph Sifakis, Rongjie Yan, Jian Zhang

TL;DR
This study evaluates four end-to-end AI autopilots for autonomous driving using the CCTest approach, comparing their safety performance with modular autopilots and analyzing discrepancies with leaderboard assessments.
Contribution
It extends the CCTest evaluation to end-to-end autopilots and compares results with modular autopilots and leaderboard scores, highlighting differences in failure modes and assessment methods.
Findings
End-to-end autopilots show different failure patterns compared to modular ones.
Significant discrepancies exist between CCTest results and leaderboard evaluations.
The study emphasizes the need for objective, combined qualitative and quantitative assessment criteria.
Abstract
End-to-end AI autopilots for autonomous driving systems have emerged as a promising alternative to traditional modular autopilots, offering the potential to reduce development costs and mitigate defects arising from module composition. However, they suffer from the well-known problems of AI systems such as non-determinism, non-explainability, and anomalies. This naturally raises the question of their evaluation and, in particular, their comparison with existing modular solutions. This work extends a study of the critical configuration testing (CCTest) approach that has been applied to four open modular autopilots. This approach differs from others in that it generates test cases ensuring safe control policies are possible for the tested autopilots. This enables an accurate assessment of the ability to drive safely in critical situations, as any incident observed in the simulation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques
