Learn from Foundation Model: Fruit Detection Model without Manual Annotation

Yanan Wang; Zhenghao Fei; Ruichen Li; Yibin Ying

arXiv:2411.16196·cs.CV·March 24, 2026

Learn from Foundation Model: Fruit Detection Model without Manual Annotation

Yanan Wang, Zhenghao Fei, Ruichen Li, Yibin Ying

PDF

Open Access 1 Repo

TL;DR

This paper introduces SDM-D, a framework that leverages foundation models and knowledge distillation to train effective fruit detection models without manual annotations, achieving near-supervised performance.

Contribution

The paper presents a novel framework combining foundation models and knowledge distillation to train domain-specific fruit detection models without manual labels.

Findings

01

SDM-D nearly matches performance of label-supervised models.

02

SDM-D outperforms open-set detection methods like Grounding SAM and YOLO-World.

03

Introduces MegaFruits dataset with over 25,000 images.

Abstract

Recent breakthroughs in large foundation models have enabled the possibility of transferring knowledge pre-trained on vast datasets to domains with limited data availability. Agriculture is one of the domains that lacks sufficient data. This study proposes a framework to train effective, domain-specific, small models from foundation models without manual annotation. Our approach begins with SDM (Segmentation-Description-Matching), a stage that leverages two foundation models: SAM2 (Segment Anything in Images and Videos) for segmentation and OpenCLIP (Open Contrastive Language-Image Pretraining) for zero-shot open-vocabulary classification. In the second stage, a novel knowledge distillation mechanism is utilized to distill compact, edge-deployable models from SDM, enhancing both inference speed and perception accuracy. The complete method, termed SDM-D…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

agroboticsresearch/sdm-d
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSmart Agriculture and AI · Vehicle License Plate Recognition · Soil and Land Suitability Analysis

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Segment Anything Model · Knowledge Distillation