# Localized Query Attack Toward Transformer-Based Visible Object Detectors

**Authors:** Yang Wang, Ang Li, Zhen Yang, Xunyun Liu

PMC · DOI: 10.3390/s26061987 · Sensors (Basel, Switzerland) · 2026-03-23

## TL;DR

This paper introduces a new adversarial attack method called Localized Query Attack to disrupt transformer-based object detectors by targeting attention mechanisms.

## Contribution

The novel Localized Query Attack method improves adversarial performance by specifically targeting encoder and decoder attention mechanisms.

## Key findings

- LQA improves transfer attack performance by approximately 20% compared to existing methods.
- LQA effectively redirects model focus to adversarial patches by manipulating attention interactions.
- Real-world validations confirm the practical effectiveness of LQA.

## Abstract

Transformer-based detectors have demonstrated exceptional accuracy in visible-object detection tasks. However, adversarial patches, specific types of adversarial examples, can disrupt these detectors by introducing unrestricted perturbations into specific image regions. Traditional methodologies focus on placing patches directly on objects and increasing attention scores between the patch and all areas of the image to impair detector performance. Nevertheless, these approaches are suboptimal due to significant discrepancies between background and object features, which contradict optimization objectives. Moreover, they overlook the impact of cross-attention mechanisms on detection results. To address these limitations, we introduce a novel approach named Localized Query Attack (LQA), designed to interfere with both self-attention within the encoder and cross-attention in the decoder. Unlike conventional global interference methods, LQA targets object features specifically, enhancing self-attention interactions between the adversarial patch and foreground regions to redirect model focus toward the patch. In the context of decoder cross-attention, we compute the joint attention matrix connecting encoder outputs with object queries. By diminishing the influence of encoder outputs and residual components in this matrix, we amplify the relative importance of the adversarial patch, thereby intensifying the attack’s effectiveness. Our experiments show that LQA achieves an approximately 20% improvement in transfer attack performance compared to the second-best method across various transformer-based detectors. The practical efficacy of LQA is further substantiated through real-world scenario validations, underscoring its applicability.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13030284/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13030284/full.md

## References

38 references — full list in the complete paper: https://tomesphere.com/paper/PMC13030284/full.md

---
Source: https://tomesphere.com/paper/PMC13030284