Bilinear CNNs for Fine-grained Visual Recognition

Tsung-Yu Lin; Aruni RoyChowdhury; Subhransu Maji

arXiv:1504.07889·cs.CV·June 2, 2017·66 cites

Bilinear CNNs for Fine-grained Visual Recognition

Tsung-Yu Lin, Aruni RoyChowdhury, Subhransu Maji

PDF

Open Access 4 Repos

TL;DR

This paper introduces Bilinear CNNs, a novel architecture for fine-grained visual recognition that captures localized feature interactions, achieves high accuracy, and can be trained end-to-end, with broad applicability and efficient performance.

Contribution

The paper proposes Bilinear CNNs for fine-grained recognition, demonstrating their effectiveness, efficiency, and adaptability across multiple datasets and tasks, with comprehensive analysis and visualization.

Findings

01

Achieves over 84% accuracy on bird dataset

02

Reduces bilinear feature size significantly without accuracy loss

03

Effective for texture and scene recognition

Abstract

We present a simple and effective architecture for fine-grained visual recognition called Bilinear Convolutional Neural Networks (B-CNNs). These networks represent an image as a pooled outer product of features derived from two CNNs and capture localized feature interactions in a translationally invariant manner. B-CNNs belong to the class of orderless texture representations but unlike prior work they can be trained in an end-to-end manner. Our most accurate model obtains 84.1%, 79.4%, 86.9% and 91.3% per-image accuracy on the Caltech-UCSD birds [67], NABirds [64], FGVC aircraft [42], and Stanford cars [33] dataset respectively and runs at 30 frames-per-second on a NVIDIA Titan X GPU. We then present a systematic analysis of these networks and show that (1) the bilinear features are highly redundant and can be reduced by an order of magnitude in size without significant loss in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques