Modular Transformer Architecture for Precision Agriculture Imaging

Brian Gopalan (1); Nathalia Nascimento (1); Vishal Monga (1) ((1) The Pennsylvania State University)

arXiv:2508.03751·cs.CV·August 8, 2025

Modular Transformer Architecture for Precision Agriculture Imaging

Brian Gopalan (1), Nathalia Nascimento (1), Vishal Monga (1) ((1) The Pennsylvania State University)

PDF

TL;DR

This paper introduces a modular transformer-based framework for weed segmentation in drone imagery, dynamically adapting to image quality issues like noise and blur to improve accuracy and efficiency in precision agriculture.

Contribution

It presents a novel quality-aware routing strategy that directs images to specialized transformer models based on degradation type, enhancing segmentation performance over traditional CNN methods.

Findings

01

Outperforms CNN-based methods in segmentation accuracy

02

Improves computational efficiency in weed segmentation

03

Effectively handles image degradation such as noise and blur

Abstract

This paper addresses the critical need for efficient and accurate weed segmentation from drone video in precision agriculture. A quality-aware modular deep-learning framework is proposed that addresses common image degradation by analyzing quality conditions-such as blur and noise-and routing inputs through specialized pre-processing and transformer models optimized for each degradation type. The system first analyzes drone images for noise and blur using Mean Absolute Deviation and the Laplacian. Data is then dynamically routed to one of three vision transformer models: a baseline for clean images, a modified transformer with Fisher Vector encoding for noise reduction, or another with an unrolled Lucy-Richardson decoder to correct blur. This novel routing strategy allows the system to outperform existing CNN-based methods in both segmentation quality and computational efficiency,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.