Cross-Layer Cache Aggregation for Token Reduction in Ultra-Fine-Grained Image Recognition
Edwin Arkel Rios, Jansen Christopher Yuanda, Vincent Leon Ghanz,, Cheng-Wei Yu, Bo-Cheng Lai, Min-Chun Hu

TL;DR
This paper introduces a novel cross-layer cache and aggregation method to improve token reduction in ultra-fine-grained image recognition, enabling significant computational savings while maintaining high accuracy.
Contribution
It proposes a new cross-layer cache mechanism and aggregation head that recover information lost during token reduction in Vision Transformers for UFGIR.
Findings
Achieves high accuracy with only 10% tokens kept.
Demonstrates effectiveness across multiple datasets and backbones.
Reduces computational cost significantly while maintaining competitive performance.
Abstract
Ultra-fine-grained image recognition (UFGIR) is a challenging task that involves classifying images within a macro-category. While traditional FGIR deals with classifying different species, UFGIR goes beyond by classifying sub-categories within a species such as cultivars of a plant. In recent times the usage of Vision Transformer-based backbones has allowed methods to obtain outstanding recognition performances in this task but this comes at a significant cost in terms of computation specially since this task significantly benefits from incorporating higher resolution images. Therefore, techniques such as token reduction have emerged to reduce the computational cost. However, dropping tokens leads to loss of essential information for fine-grained categories, specially as the token keep rate is reduced. Therefore, to counteract the loss of information brought by the usage of token…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing Techniques and Applications · Advanced Neural Network Applications · Industrial Vision Systems and Defect Detection
