SciLT: Long-tailed Image Classification under Scientific Image Domains
Jiahao Chen, Bing Su

TL;DR
This paper introduces SciLT, a framework for scientific long-tailed image classification that leverages multi-level features and dual supervision to improve performance across class distributions.
Contribution
The paper proposes SciLT, a novel method that exploits multi-level representations and dual-supervision learning to enhance scientific long-tailed recognition.
Findings
Fine-tuning foundation models yields limited gains on scientific data.
Penultimate-layer features are crucial for tail class recognition.
SciLT outperforms existing methods across multiple scientific benchmarks.
Abstract
Long-tailed recognition has benefited from foundation models and fine-tuning paradigms, yet existing studies and benchmarks are mainly confined to natural image domains, where pre-training and fine-tuning data share similar distributions. In contrast, scientific images exhibit distinct visual characteristics and supervision signals, raising questions about the effectiveness of fine-tuning foundation models in such settings. In this work, we investigate scientific long-tailed recognition under a purely visual and fine-tuning paradigm. Experiments on three scientific benchmarks show that fine-tuning foundation models yields limited gains, and reveal that penultimate-layer features play an important role, particularly for tail classes. Motivated by these findings, we propose SciLT, a framework that exploits multi-level representations through adaptive feature fusion and dual-supervision…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
