Tile-Based ViT Inference with Visual-Cluster Priors for Zero-Shot Multi-Species Plant Identification

Murilo Gustineli; Anthony Miyaguchi; Adrian Cheung; Divyansh Khattak

arXiv:2507.06093·cs.CV·July 9, 2025

Tile-Based ViT Inference with Visual-Cluster Priors for Zero-Shot Multi-Species Plant Identification

Murilo Gustineli, Anthony Miyaguchi, Adrian Cheung, Divyansh Khattak

PDF

Open Access 1 Repo

TL;DR

This paper presents a novel tile-based Vision Transformer approach with visual-cluster priors for zero-shot multi-species plant identification, achieving competitive results without additional training.

Contribution

It introduces a tiling strategy combined with visual clustering and Bayesian priors for improved inference in plant identification tasks.

Findings

01

Achieved macro-averaged F1 of 0.348 on private leaderboard

02

Utilized a 4x4 tiling strategy aligned with network receptive field

03

No additional training required for the proposed method

Abstract

We describe DS@GT's second-place solution to the PlantCLEF 2025 challenge on multi-species plant identification in vegetation quadrat images. Our pipeline combines (i) a fine-tuned Vision Transformer ViTD2PC24All for patch-level inference, (ii) a 4x4 tiling strategy that aligns patch size with the network's 518x518 receptive field, and (iii) domain-prior adaptation through PaCMAP + K-Means visual clustering and geolocation filtering. Tile predictions are aggregated by majority vote and re-weighted with cluster-specific Bayesian priors, yielding a macro-averaged F1 of 0.348 (private leaderboard) while requiring no additional training. All code, configuration files, and reproducibility scripts are publicly available at https://github.com/dsgt-arc/plantclef-2025.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dsgt-arc/plantclef-2025
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSmart Agriculture and AI · Remote Sensing in Agriculture · Advanced Neural Network Applications

MethodsDropout · Label Smoothing · Byte Pair Encoding · Absolute Position Encodings · Softmax · Transformer · Layer Normalization · Dense Connections · Vision Transformer