# Visual recognition in the wild by sampling deep similarity functions

**Authors:** Mikhail Usvyatsov, Konrad Schindler

arXiv: 1903.06837 · 2019-03-19

## TL;DR

This paper proposes a similarity-based recognition approach using Siamese CNNs that effectively handles domain shift in wild environments by matching test instances to training exemplars, reducing overfitting and improving efficiency.

## Contribution

It introduces a Siamese CNN framework for similarity learning that enables robust object recognition in unseen environments by matching test samples to training exemplars.

## Key findings

- Siamese CNNs recover relative similarities accurately.
- Few training exemplars suffice for good predictions.
- Method reduces overfitting compared to direct classification.

## Abstract

Recognising relevant objects or object states in its environment is a basic capability for an autonomous robot. The dominant approach to object recognition in images and range images is classification by supervised machine learning, nowadays mostly with deep convolutional neural networks (CNNs). This works well for target classes whose variability can be completely covered with training examples. However, a robot moving in the wild, i.e., in an environment that is not known at the time the recognition system is trained, will often face \emph{domain shift}: the training data cannot be assumed to exhaustively cover all the within-class variability that will be encountered in the test data. In that situation, learning is in principle possible, since the training set does capture the defining properties, respectively dissimilarities, of the target classes. But directly training a CNN to predict class probabilities is prone to overfitting to irrelevant correlations between the class labels and the specific subset of the target class that is represented in the training set. We explore the idea to instead learn a Siamese CNN that acts as similarity function between pairs of training examples. Class predictions are then obtained by measuring the similarities between a new test instance and the training samples. We show that the CNN embedding correctly recovers the relative similarities to arbitrary class exemplars in the training set. And that therefore few, randomly picked training exemplars are sufficient to achieve good predictions, making the procedure efficient.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1903.06837/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/1903.06837/full.md

## References

20 references — full list in the complete paper: https://tomesphere.com/paper/1903.06837/full.md

---
Source: https://tomesphere.com/paper/1903.06837