A Simple Fix for Convolutional Neural Network via Coordinate Embedding
Liliang Ren, Zhuonan Hao

TL;DR
This paper introduces a simple coordinate embedding method to enhance CNNs by incorporating pixel coordinate information, improving their ability to handle affine transformations and increasing robustness.
Contribution
The paper proposes a straightforward coordinate embedding technique that can be applied to existing CNNs without architectural changes, boosting performance and robustness.
Findings
Significant performance improvements on traffic sign detection.
Enhanced robustness to affine transformations.
Easy integration with pre-trained models.
Abstract
Convolutional Neural Networks (CNN) has been widely applied in the realm of computer vision. However, given the fact that CNN models are translation invariant, they are not aware of the coordinate information of each pixel. Thus the generalization ability of CNN will be limited since the coordinate information is crucial for a model to learn affine transformations which directly operate on the coordinate of each pixel. In this project, we proposed a simple approach to incorporate the coordinate information to the CNN model through coordinate embedding. Our approach does not change the downstream model architecture and can be easily applied to the pre-trained models for the task like object detection. Our experiments on the German Traffic Sign Detection Benchmark show that our approach not only significantly improve the model performance but also have better robustness with respect to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Handwritten Text Recognition Techniques
