How Shift Equivariance Impacts Metric Learning for Instance Segmentation
Josef Lorenz Rumberger, Xiaoyan Yu, Peter Hirsch, Melanie Dohmen,, Vanessa Emanuela Guarino, Ashkan Mokarian, Lisa Mais, Jan Funke, Dagmar, Kainmueller

TL;DR
This paper provides a formal analysis of how shift equivariance in CNNs affects metric learning for instance segmentation, revealing fundamental limits and practical guidelines for tile-based approaches.
Contribution
It offers the first comprehensive formal analysis of shift equivariance in encoder-decoder CNNs for metric learning, establishing capacity limits and conditions to avoid discontinuities.
Findings
Standard encoder-decoder CNNs can distinguish at most f^{dl} same-looking objects.
To prevent discontinuities, training must use output windows larger than f^l.
Empirical results on synthetic data support the theoretical analysis.
Abstract
Metric learning has received conflicting assessments concerning its suitability for solving instance segmentation tasks. It has been dismissed as theoretically flawed due to the shift equivariance of the employed CNNs and their respective inability to distinguish same-looking objects. Yet it has been shown to yield state of the art results for a variety of tasks, and practical issues have mainly been reported in the context of tile-and-stitch approaches, where discontinuities at tile boundaries have been observed. To date, neither of the reported issues have undergone thorough formal analysis. In our work, we contribute a comprehensive formal analysis of the shift equivariance properties of encoder-decoder-style CNNs, which yields a clear picture of what can and cannot be achieved with metric learning in the face of same-looking objects. In particular, we prove that a standard…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Image Processing and 3D Reconstruction
