Trojan Source: Invisible Vulnerabilities
Nicholas Boucher, Ross Anderson

TL;DR
Trojan Source attacks exploit Unicode encoding to embed malicious code that appears different to humans and compilers, posing significant security risks across multiple programming languages and development tools.
Contribution
The paper introduces Trojan Source vulnerabilities, demonstrates their impact across various languages, and proposes compiler-level defenses and mitigation strategies.
Findings
Identified vulnerabilities in multiple programming languages.
Demonstrated real-world attack examples.
Proposed effective compiler defenses.
Abstract
We present a new type of attack in which source code is maliciously encoded so that it appears different to a compiler and to the human eye. This attack exploits subtleties in text-encoding standards such as Unicode to produce source code whose tokens are logically encoded in a different order from the one in which they are displayed, leading to vulnerabilities that cannot be perceived directly by human code reviewers. 'Trojan Source' attacks, as we call them, pose an immediate threat both to first-party software and of supply-chain compromise across the industry. We present working examples of Trojan Source attacks in C, C++, C#, JavaScript, Java, Rust, Go, Python, SQL, Bash, Assembly, and Solidity. We propose definitive compiler-level defenses, and describe other mitigating controls that can be deployed in editors, repositories, and build pipelines while compilers are upgraded to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Security and Verification in Computing · Web Application Security Vulnerabilities
