Shape and Substance: Dual-Layer Side-Channel Attacks on Local Vision-Language Models

Eyal Hadad; Mordechai Guri

arXiv:2603.25403·cs.CR·March 30, 2026

Shape and Substance: Dual-Layer Side-Channel Attacks on Local Vision-Language Models

Eyal Hadad, Mordechai Guri

PDF

TL;DR

This paper uncovers a dual-layer side-channel attack on local vision-language models that exploits dynamic preprocessing to infer sensitive input details, highlighting security vulnerabilities in on-device AI systems.

Contribution

It introduces a novel dual-layer attack framework exploiting execution-time and cache contention signals on dynamic preprocessing models, and discusses mitigation strategies with their trade-offs.

Findings

01

Attack reliably fingerprints input geometry using execution-time variations.

02

Cache contention profiling distinguishes between dense and sparse visual content.

03

Mitigation via constant-work padding incurs significant performance overhead.

Abstract

On-device Vision-Language Models (VLMs) promise data privacy via local execution. However, we show that the architectural shift toward Dynamic High-Resolution preprocessing (e.g., AnyRes) introduces an inherent algorithmic side-channel. Unlike static models, dynamic preprocessing decomposes images into a variable number of patches based on their aspect ratio, creating workload-dependent inputs. We demonstrate a dual-layer attack framework against local VLMs. In Tier 1, an unprivileged attacker can exploit significant execution-time variations using standard unprivileged OS metrics to reliably fingerprint the input's geometry. In Tier 2, by profiling Last-Level Cache (LLC) contention, the attacker can resolve semantic ambiguity within identical geometries, distinguishing between visually dense (e.g., medical X-rays) and sparse (e.g., text documents) content. By evaluating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.