Scalable Language Agnostic Taint Tracking using Explicit Data Dependencies
Sedick David Baker Effendi, Xavier Pinho, Andrei Michael Dreyer, Fabian Yamaguchi

TL;DR
This paper introduces a scalable, language-agnostic taint tracking system that overcomes manual annotation challenges and improves performance for vulnerability analysis in large codebases.
Contribution
It presents a novel data-dependence representation that handles missing annotations via over-approximation, integrated into the open-source Joern platform.
Findings
Enables scalable taint analysis across multiple programming languages.
Reduces manual effort by over-approximating library procedure data flows.
Improves analysis speed and flexibility in continuous development environments.
Abstract
Taint analysis using explicit whole-program data-dependence graphs is powerful for vulnerability discovery but faces two major challenges. First, accurately modeling taint propagation through calls to external library procedures requires extensive manual annotations, which becomes impractical for large ecosystems. Second, the sheer size of whole-program graph representations leads to serious scalability and performance issues, particularly when quick analysis is needed in continuous development pipelines. This paper presents the design and implementation of a system for a language-agnostic data-dependence representation. The system accommodates missing annotations describing the behavior of library procedures by over-approximating data flows, allowing annotations to be added later without recalculation. We contribute this data-flow analysis system to the open-source code analysis…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
