# In defense of local descriptor-based few-shot object detection

**Authors:** Shichao Zhou, Haoyan Li, Zhuowei Wang, Zekai Zhang

PMC · DOI: 10.3389/fnins.2024.1349204 · 2024-02-12

## TL;DR

This paper proposes a learning-free few-shot object detection method using refined local descriptors and brain-inspired feature representations.

## Contribution

A novel few-shot detection framework using spatial contextual attention and Kernel-InfoNCE loss to enhance local descriptors.

## Key findings

- Local descriptors can be improved with spatial contextual attention for better global structure understanding.
- The proposed method achieves effective few-shot detection without intensive training on remote sensing images.
- The model uses non-parametric similarity computation for accelerated detection.

## Abstract

State-of-the-art image object detection computational models require an intensive parameter fine-tuning stage (using deep convolution network, etc). with tens or hundreds of training examples. In contrast, human intelligence can robustly learn a new concept from just a few instances (i.e., few-shot detection). The distinctive perception mechanisms between these two families of systems enlighten us to revisit classical handcraft local descriptors (e.g., SIFT, HOG, etc.) as well as non-parametric visual models, which innately require no learning/training phase. Herein, we claim that the inferior performance of these local descriptors mainly results from a lack of global structure sense. To address this issue, we refine local descriptors with spatial contextual attention of neighbor affinities and then embed the local descriptors into discriminative subspace guided by Kernel-InfoNCE loss. Differing from conventional quantization of local descriptors in high-dimensional feature space or isometric dimension reduction, we actually seek a brain-inspired few-shot feature representation for the object manifold, which combines data-independent primitive representation and semantic context learning and thus helps with generalization. The obtained embeddings as pattern vectors/tensors permit us an accelerated but non-parametric visual similarity computation as the decision rule for final detection. Our approach to few-shot object detection is nearly learning-free, and experiments on remote sensing imageries (approximate 2-D affine space) confirm the efficacy of our model.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC10894920/full.md

---
Source: https://tomesphere.com/paper/PMC10894920