# Semantic Interaction Meta-Learning Based on Patch Matching Metric

**Authors:** Baoguo Wei, Xinyu Wang, Yuetong Su, Yue Zhang, Lixin Li

PMC · DOI: 10.3390/s24175620 · Sensors (Basel, Switzerland) · 2024-08-30

## TL;DR

This paper introduces a new meta-learning framework that improves few-shot image classification by combining patch-level visual features with semantic information.

## Contribution

The novel PatSiML framework uses patch matching and semantic interaction to enhance few-shot learning performance.

## Key findings

- PatSiML improves classification accuracy by 0.65% to 21.15% over existing methods on four datasets.
- The framework uses patch embeddings and graph convolutional networks to avoid supervision collapse.
- Semantic interaction via word embeddings enhances task-specific feature representation.

## Abstract

Metric-based meta-learning methods have demonstrated remarkable success in the domain of few-shot image classification. However, their performance is significantly contingent upon the choice of metric and the feature representation for the support classes. Current approaches, which predominantly rely on holistic image features, may inadvertently disregard critical details necessary for novel tasks, a phenomenon known as “supervision collapse”. Moreover, relying solely on visual features to characterize support classes can prove to be insufficient, particularly in scenarios involving limited sample sizes. In this paper, we introduce an innovative framework named Patch Matching Metric-based Semantic Interaction Meta-Learning (PatSiML), designed to overcome these challenges. To counteract supervision collapse, we have developed a patch matching metric strategy based on the Transformer architecture to transform input images into a set of distinct patch embeddings. This approach dynamically creates task-specific embeddings, facilitated by a graph convolutional network, to formulate precise matching metrics between the support classes and the query image patches. To enhance the integration of semantic knowledge, we have also integrated a label-assisted channel semantic interaction strategy. This strategy merges word embeddings with patch-level visual features across the channel dimension, utilizing a sophisticated language model to combine semantic understanding with visual information. Our empirical findings across four diverse datasets reveal that the PatSiML method achieves a classification accuracy improvement of 0.65% to 21.15% over existing methodologies, underscoring its robustness and efficacy.

## Full-text entities

- **Genes:** VIT (vitrin) [NCBI Gene 483032]
- **Diseases:** FSL (MESH:D007859), injury to people or property (MESH:C000719191)
- **Chemicals:** GCN (-), L (MESH:D007930)
- **Species:** Felis catus (cat, species) [taxon 9685], Canis lupus familiaris (dog, subspecies) [taxon 9615]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11398163/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC11398163/full.md

## References

37 references — full list in the complete paper: https://tomesphere.com/paper/PMC11398163/full.md

---
Source: https://tomesphere.com/paper/PMC11398163