Bilinear CNNs for Fine-grained Visual Recognition
Tsung-Yu Lin, Aruni RoyChowdhury, Subhransu Maji

TL;DR
This paper introduces Bilinear CNNs, a novel architecture for fine-grained visual recognition that captures localized feature interactions, achieves high accuracy, and can be trained end-to-end, with broad applicability and efficient performance.
Contribution
The paper proposes Bilinear CNNs for fine-grained recognition, demonstrating their effectiveness, efficiency, and adaptability across multiple datasets and tasks, with comprehensive analysis and visualization.
Findings
Achieves over 84% accuracy on bird dataset
Reduces bilinear feature size significantly without accuracy loss
Effective for texture and scene recognition
Abstract
We present a simple and effective architecture for fine-grained visual recognition called Bilinear Convolutional Neural Networks (B-CNNs). These networks represent an image as a pooled outer product of features derived from two CNNs and capture localized feature interactions in a translationally invariant manner. B-CNNs belong to the class of orderless texture representations but unlike prior work they can be trained in an end-to-end manner. Our most accurate model obtains 84.1%, 79.4%, 86.9% and 91.3% per-image accuracy on the Caltech-UCSD birds [67], NABirds [64], FGVC aircraft [42], and Stanford cars [33] dataset respectively and runs at 30 frames-per-second on a NVIDIA Titan X GPU. We then present a systematic analysis of these networks and show that (1) the bilinear features are highly redundant and can be reduced by an order of magnitude in size without significant loss in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
