CornViT: A Multi-Stage Convolutional Vision Transformer Framework for Hierarchical Corn Kernel Analysis
Sai Teja Erukude, Jane Mascarenhas, Lior Shamir

TL;DR
CornViT is a three-stage convolutional vision transformer framework that automates hierarchical corn kernel analysis, achieving high accuracy and providing a deployable web application for seed quality assessment.
Contribution
This work introduces a novel multi-stage CvT framework with curated datasets and demonstrates superior accuracy over ResNet and DenseNet for corn kernel grading tasks.
Findings
Achieved over 93% accuracy in purity classification
Outperformed ResNet-50 and DenseNet-121 in accuracy
Provided a publicly available dataset and web application
Abstract
Accurate grading of corn kernels is critical for seed certification, directional seeding, and breeding, yet it is still predominantly performed by manual inspection. This work introduces CornViT, a three-stage Convolutional Vision Transformer (CvT) framework that emulates the hierarchical reasoning of human seed analysts for single-kernel evaluation. Three sequential CvT-13 classifiers operate on 384x384 RGB images: Stage 1 distinguishes pure from impure kernels; Stage 2 categorizes pure kernels into flat and round morphologies; and Stage 3 determines the embryo orientation (up vs. down) for pure, flat kernels. Starting from a public corn seed image collection, we manually relabeled and filtered images to construct three stage-specific datasets: 7265 kernels for purity, 3859 pure kernels for morphology, and 1960 pure-flat kernels for embryo orientation, all released as benchmarks.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSmart Agriculture and AI · Remote Sensing in Agriculture · Spectroscopy and Chemometric Analyses
