# Agentic Discovery and Validation of Android App Vulnerabilities

**Authors:** Ziyue Wang, Liyi Zhou

arXiv: 2508.21579 · 2025-09-01

## TL;DR

A2 is a system that improves Android vulnerability detection by combining semantic analysis with validation techniques, reducing false positives and providing validated exploitability evidence, thus enhancing security assessment efficiency.

## Contribution

A2 introduces a novel two-phase approach that mirrors expert analysis, combining semantic understanding with multi-modal validation to improve vulnerability detection accuracy.

## Key findings

- Achieves 78.3% coverage on Ghera benchmark, surpassing state-of-the-art analyzers.
- Distills results into 82 speculative findings with 47 Ghera cases and 28 true positives.
- Uncovers 104 true-positive zero-day vulnerabilities in real-world APKs, with over half validated by PoCs.

## Abstract

Existing Android vulnerability detection tools overwhelm teams with thousands of low-signal warnings yet uncover few true positives. Analysts spend days triaging these results, creating a bottleneck in the security pipeline. Meanwhile, genuinely exploitable vulnerabilities often slip through, leaving opportunities open to malicious counterparts.   We introduce A2, a system that mirrors how security experts analyze and validate Android vulnerabilities through two complementary phases: (i) Agentic Vulnerability Discovery, which reasons about application security by combining semantic understanding with traditional security tools; and (ii) Agentic Vulnerability Validation, which systematically validates vulnerabilities across Android's multi-modal attack surface-UI interactions, inter-component communication, file system operations, and cryptographic computations.   On the Ghera benchmark (n=60), A2 achieves 78.3% coverage, surpassing state-of-the-art analyzers (e.g., APKHunt 30.0%). Rather than overwhelming analysts with thousands of warnings, A2 distills results into 82 speculative vulnerability findings, including 47 Ghera cases and 28 additional true positives. Crucially, A2 then generates working Proof-of-Concepts (PoCs) for 51 of these speculative findings, transforming them into validated vulnerability findings that provide direct, self-confirming evidence of exploitability.   In real-world evaluation on 169 production APKs, A2 uncovers 104 true-positive zero-day vulnerabilities. Among these, 57 (54.8%) are self-validated with automatically generated PoCs, including a medium-severity vulnerability in a widely used application with over 10 million installs.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.21579/full.md

## Figures

17 figures with captions in the complete paper: https://tomesphere.com/paper/2508.21579/full.md

## References

45 references — full list in the complete paper: https://tomesphere.com/paper/2508.21579/full.md

---
Source: https://tomesphere.com/paper/2508.21579