Looking for the Devil in the Details: Learning Trilinear Attention Sampling Network for Fine-grained Image Recognition
Heliang Zheng, Jianlong Fu, Zheng-Jun Zha, Jiebo Luo

TL;DR
This paper introduces TASN, a novel network that efficiently learns fine-grained image features by sampling and distilling part details using trilinear attention, outperforming existing methods in accuracy.
Contribution
The paper proposes TASN, a trilinear attention sampling network that captures detailed features from numerous parts efficiently for fine-grained recognition.
Findings
TASN achieves state-of-the-art performance on multiple datasets.
It effectively models inter-channel relationships for attention.
The approach reduces computational costs compared to existing methods.
Abstract
Learning subtle yet discriminative features (e.g., beak and eyes for a bird) plays a significant role in fine-grained image recognition. Existing attention-based approaches localize and amplify significant parts to learn fine-grained details, which often suffer from a limited number of parts and heavy computational cost. In this paper, we propose to learn such fine-grained features from hundreds of part proposals by Trilinear Attention Sampling Network (TASN) in an efficient teacher-student manner. Specifically, TASN consists of 1) a trilinear attention module, which generates attention maps by modeling the inter-channel relationships, 2) an attention-based sampler which highlights attended parts with high resolution, and 3) a feature distiller, which distills part features into a global one by weight sharing and feature preserving strategies. Extensive experiments verify that TASN…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
