# Detecting Stealthy Data Poisoning Attacks in AI Code Generators

**Authors:** Cristina Improta

arXiv: 2508.21636 · 2025-09-01

## TL;DR

This paper investigates the difficulty of detecting subtle, triggerless data poisoning attacks in AI code generators, revealing current defenses are largely ineffective against such stealthy threats.

## Contribution

It systematically evaluates existing poisoning detection methods against triggerless attacks on multiple models, highlighting their limitations and the need for more robust defenses.

## Key findings

- Existing detection methods fail against triggerless poisoning
- Representation-based approaches cannot reliably identify poisoned samples
- Static analysis produces high false positives and negatives

## Abstract

Deep learning (DL) models for natural language-to-code generation have become integral to modern software development pipelines. However, their heavy reliance on large amounts of data, often collected from unsanitized online sources, exposes them to data poisoning attacks, where adversaries inject malicious samples to subtly bias model behavior. Recent targeted attacks silently replace secure code with semantically equivalent but vulnerable implementations without relying on explicit triggers to launch the attack, making it especially hard for detection methods to distinguish clean from poisoned samples. We present a systematic study on the effectiveness of existing poisoning detection methods under this stealthy threat model. Specifically, we perform targeted poisoning on three DL models (CodeBERT, CodeT5+, AST-T5), and evaluate spectral signatures analysis, activation clustering, and static analysis as defenses. Our results show that all methods struggle to detect triggerless poisoning, with representation-based approaches failing to isolate poisoned samples and static analysis suffering false positives and false negatives, highlighting the need for more robust, trigger-independent defenses for AI-assisted code generation.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.21636/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/2508.21636/full.md

## References

28 references — full list in the complete paper: https://tomesphere.com/paper/2508.21636/full.md

---
Source: https://tomesphere.com/paper/2508.21636