Data Leakage in Automotive Perception: Practitioners' Insights

Md Abu Ahammed Babu; Sushant Kumar Pandey; Darko Durisic; Andras Balint; Miroslaw Staron

arXiv:2604.06899·cs.CR·April 9, 2026

Data Leakage in Automotive Perception: Practitioners' Insights

Md Abu Ahammed Babu, Sushant Kumar Pandey, Darko Durisic, Andras Balint, Miroslaw Staron

PDF

TL;DR

This study explores how automotive perception practitioners understand and manage data leakage, revealing role-based perceptions and emphasizing the importance of shared practices and communication for ML reliability.

Contribution

It provides empirical insights into practitioners' perceptions, highlighting the socio-technical nature of data leakage management in automotive ML development.

Findings

01

Knowledge of data leakage is widespread but role-dependent.

02

Detection often relies on performance anomalies rather than specific tools.

03

Prevention depends on experience and knowledge sharing.

Abstract

Data leakage is the inadvertent transfer of information between training and evaluation datasets that poses a subtle, yet critical, risk to the reliability of machine learning (ML) models in safety-critical systems such as automotive perception. While leakage is widely recognized in research, little is known about how industrial practitioners actually perceive and manage it in practice. This study investigates practitioners' knowledge, experiences, and mitigation strategies around data leakage through ten semi-structured interviews with system design, development, and verification engineers working on automotive perception functions development. Using reflexive thematic analysis, we identify that knowledge of data leakage is widespread and fragmented along role boundaries: ML engineers conceptualize it as a data-splitting or validation issue, whereas design and verification roles…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.