# Simple Conditional Spatial Query Mask Deformable Detection Transformer: A Detection Approach for Multi-Style Strokes of Chinese Characters

**Authors:** Tian Zhou, Wu Xie, Huimin Zhang, Yong Fan

PMC · DOI: 10.3390/s24030931 · Sensors (Basel, Switzerland) · 2024-01-31

## TL;DR

This paper introduces a new detection method for robotic arms to accurately write Chinese characters by improving stroke detection accuracy and efficiency.

## Contribution

The novel SCSQ-MDD method improves deformable DETR by using a mask prediction layer and separating content and spatial queries.

## Key findings

- The proposed SCSQ-MDD method improves mean average precision (mAP) by 3.8% compared to deformable DETR.
- The method also improves mean average recall (mAR) by 1.1% in testing.
- The new approach addresses randomness in reference point correlation calculations through resampling.

## Abstract

In the Chinese character writing task performed by robotic arms, the stroke category and position information should be extracted through object detection. Detection algorithms based on predefined anchor frames have difficulty resolving the differences among the many different styles of Chinese character strokes. Deformable detection transformer (deformable DETR) algorithms without predefined anchor frames result in some invalid sampling points with no contribution to the feature update of the current reference point due to the random sampling of sampling points in the deformable attention module. These processes cause a reduction in the speed of the vector learning stroke features in the detection head. In view of this problem, a new detection method for multi-style strokes of Chinese characters, called the simple conditional spatial query mask deformable DETR (SCSQ-MDD), is proposed in this paper. Firstly, a mask prediction layer is jointly determined using the shallow feature map of the Chinese character image and the query vector of the transformer encoder, which is used to filter the points with actual contributions and resample the points without contributions to address the randomness of the correlation calculation among the reference points. Secondly, by separating the content query and spatial query of the transformer decoder, the dependence of the prediction task on the content embedding is relaxed. Finally, the detection model without predefined anchor frames based on the SCSQ-MDD is constructed. Experiments are conducted using a multi-style Chinese character stroke dataset to evaluate the performance of the SCSQ-MDD. The mean average precision (mAP) value is improved by 3.8% and the mean average recall (mAR) value is improved by 1.1% compared with the deformable DETR in the testing stage, illustrating the effectiveness of the proposed method.

## Full-text entities

- **Diseases:** Strokes (MESH:D020521), MDD (MESH:D003865)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC10857204/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC10857204/full.md

## References

37 references — full list in the complete paper: https://tomesphere.com/paper/PMC10857204/full.md

---
Source: https://tomesphere.com/paper/PMC10857204