Towards a Decentralized, Autonomous Multiagent Framework for Mitigating   Crop Loss

Roi Ceren; Shannon Quinn; Glen Raines

arXiv:1901.02035·cs.AI·January 9, 2019

Towards a Decentralized, Autonomous Multiagent Framework for Mitigating Crop Loss

Roi Ceren, Shannon Quinn, Glen Raines

PDF

Open Access

TL;DR

This paper introduces a decentralized multiagent system using decision-theoretic methods and reinforcement learning to identify crop stress efficiently across multiple sensor layers, reducing computational costs in real-time agricultural monitoring.

Contribution

It presents the Agricultural Distributed Decision Framework (ADDF), integrating heterogeneous sensors and a novel reinforcement learning approach for online crop stress detection.

Findings

01

Effective multi-layer decision system for crop stress detection

02

Reinforcement learning improves decision efficiency

03

System reduces unnecessary sensor data processing

Abstract

We propose a generalized decision-theoretic system for a heterogeneous team of autonomous agents who are tasked with online identification of phenotypically expressed stress in crop fields.. This system employs four distinct types of agents, specific to four available sensor modalities: satellites (Layer 3), uninhabited aerial vehicles (L2), uninhabited ground vehicles (L1), and static ground-level sensors (L0). Layers 3, 2, and 1 are tasked with performing image processing at the available resolution of the sensor modality and, along with data generated by layer 0 sensors, identify erroneous differences that arise over time. Our goal is to limit the use of the more computationally and temporally expensive subsequent layers. Therefore, from layer 3 to 1, each layer only investigates areas that previous layers have identified as potentially afflicted by stress. We introduce a…

Tables3

Table 1. Table 1 : Q-learning baseline, varying the observation and sector counts.

$\| O \|$	Agent	Accuracy
		True		False		Overall
		Positive	Negative	Positive	Negative	Overall
3	Fast	5	37,416	2	37,577	$49.9 %$
3	Slow	2	3	1	1	$71.4 %$
5	Fast	64	37,695	3	37,238	$50.3 %$
5	Slow	35	4	1	27	$56.3 %$

Table 2. Table 2 : ADDF with k = 500 𝑘 500 k=500 .

$\| O \|$	Agent	Accuracy
		True		False		Overall
		Positive	Negative	Positive	Negative	Overall
3	Fast	26,068	34,032	3,671	11,229	$80.1 %$
3	Slow	13,067	11,895	4,068	709	$83.9 %$
5	Fast	20,756	36,737	3,732	14,775	$75.6 %$
5	Slow	12,052	7,612	701	4,123	$80.3 %$

Table 3. Table 3 : The baseline and ADDF algorithms utilizing increased workload heuristics to employ the slow agent more frequently.

Method	$\| O \|$	Agent	Accuracy
			True		False		Overall
			Positive	Negative	Positive	Negative	Overall
Baseline	3	Fast	30552	26445	14202	3801	$76 %$
	3	Slow	22238	13629	5110	5023	$78 %$
	5	Fast	25249	28579	12067	9105	$71.8 %$
	5	Slow	17061	15549	4927	7463	$72.5 %$
ADDF	3	Fast	30703	31355	2327	10615	$82.7 %$
	3	Slow	24260	13488	4936	1415	$83.9 %$
	5	Fast	28025	31995	6239	8741	$80 %$
	5	Slow	17865	18242	3991	4898	$80.2 %$

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSmart Agriculture and AI · Greenhouse Technology and Climate Control · Evolutionary Algorithms and Applications

Full text

Towards a Decentralized, Autonomous Multiagent Framework for Mitigating Crop Loss

Roi Ceren

Dept. Computer Science

University of Georgia

[email protected]

Shannon Quinn

Dept. Computer Science

University of Georgia

[email protected]

Glen Rains

Dept. Entomology

University of Georgia

[email protected]

(October 2018)

Abstract

We propose a generalized decision-theoretic system for a heterogeneous team of autonomous agents who are tasked with online identification of phenotypically expressed stress in crop fields. This system employs four distinct types of agents, specific to four available sensor modalities: satellites (Layer 3), uninhabited aerial vehicles (L2), uninhabited ground vehicles (L1), and static ground-level sensors (L0). Layers 3, 2, and 1 are tasked with performing image processing at the available resolution of the sensor modality and, along with data generated by layer 0 sensors, identify erroneous differences that arise over time. Our goal is to limit the use of the more computationally and temporally expensive subsequent layers. Therefore, from layer 3 to 1, each layer only investigates areas that previous layers have identified as potentially afflicted by stress. We introduce a reinforcement learning technique based on Perkins’ Monte Carlo Exploring Starts for a generalized Markovian model for each layer’s decision problem, and label the system the Agricultural Distributed Decision Framework (ADDF). As our domain is real-world and online, we illustrate implementations of the two major components of our system: a clustering-based image processing methodology and a two-layer POMDP implementation.

1 Introduction

The low-cost availability of imaging technology has given rise to the rapidly developing field of precision agriculture, often marked by the use of multispectral image collection via autonomous uninhabited aerial vehicles (AUAVs) [25]. As an example, recent efforts combining AUAVs, normalized difference vegetation index (NDVI) imaging, and environmental barometric and water potential sensors have been used to create efficient autonomous systems for targeted crop field watering [21]. Additionally, many targeted image processing systems have been developed for the purpose of specific disease identification based on phenotypic expression, such as lesions, browning, and tumors [11].

While precision watering techniques have dramatically improved yields for large-scale farms, the advent of autonomous intervention for disease propagation is nascent [11]. While some generalized models exist to detect these stresses, they have not been introduced to the distributed autonomous systems as in precision watering. To that end, we adopt the problem of identifying and predicting the onset of stresses (pest and pathogen) in crop fields via environmental sensor and image data, taken at various resolutions throughout a growing season. We factor the distribution of functional capabilities of our physical system into four distinct layers, comprised of satellites (layer 3), AUAVs (L2), autonomous uninhabited ground vehicles (AUGVS, L1), and static ground-level sensors (L0). Generally, the output of each layer (excluding L0) is used to inform the decision making of the layer below it by raising a call-to-action, wherein the layer believes a stress is occurring based on phenotypic expression that differs in a geographical location.

A common challenge of real-world problem domains, particularly in the agricultural domain, is the constraint on sample availability for machine learning. Since we are attempting to uncover the true state of stress in a crop field without prior knowledge, we propose a model-free exploration of policies a la Perkins’ Monte Carlo Exploring Starts (MCES) for Partially Observable Markov Decision Processes (POMDPs), labeled MCES-P [13]. MCES-P iterates over memory-less policies that directly map actions to observations, instead of beliefs of the state [23]. As a first best effort towards our goal, we assemble our sensor modalities into a heterogeneous team, utilize an image processing technique to extract potentially stressed sectors, and learn policies that map these observations of phenotypic deviations to calls for intervention.

This work is divided into the following sections. Section 2 covers the related topics in precision agriculture. Section 3 introduces the necessary concepts for the algorithms used in Sec. 5. Before covering the framework, Sec. 4 describes the problem domain, including descriptions of our crop fields and the arrangement of our available physical sensor modalities. We then test prototypical experiments of our real-world domain in Sec. 6.

2 Related Work

Our approach falls under the body of work characterized by the category of precision agriculture, tackled by a variety of fields, including agriculture, agronomics, computer science, robotics, engineering, and physics. In particular, the relevant subtopics we explore include disease detection, nutrient deficiency, and insufficient water potential. This data provides a basis for precision agro-management, such as through spot spraying, targeted water irrigation and nitrogen application.

The most recent advance in precision agriculture is the FarmBeats initiative driven by Microsoft AI [21], in which a variety of network-accessible sensor modalities, including soil water potential sensors and AUAVs, are arranged to provide automated and targeted water intervention. This methodology is powerful for tackling stresses due to underwatering, but is incapable of detecting the presence of pathogens, pests and nutrient deficiency, which express themselves phenotypically.

Concerning the goal of disease identification and intervention, the wide array of contemporary efforts leverage phenotypic expressions of stress largely via thermal detection [8] and are often specific to the expression from a specific disease [11]. What remains is a generalized model that encompasses the variety of stresses in a model-free way. That is, instead of seeking a particular expression, learn the correlation between erroneous growth patterns, leaf and fruit necrosis or chlorosis, leaf spots, leaf striations and wilting (as caused by stress) and the available image and environmental data.

3 Background

In this section we cover the state of the art on the two major components of our work: modeling the temporal evolution of systems via image processing and employing model-free reinforcement learning in Markov models.

3.1 Image Processing

The availability of high-dimensional multispectral image data of crop fields in the last few years [3] has dramatically increased the development of computational systems designed to analyze and interpret crop image data. In parallel, NDVI was established as a powerful metric for image data, as it computes visual attributes of crop fields while eliminating non-vegetative properties [16].

The field of crop field temporally-evolving image processing via NDVI imaging is a nascent and quickly growing field [14]. Contemporary work largely focuses on retrospective curve-fitting, as in time series analysis on Advanced Very-High-Resolution Radiometer (AVHRR) [12] and Moderate Resolution Imaging Spectroradiometer (MODIS)[15] data, with several focusing on the root-mean-squared deviation (RMSD) metric over pairwise pixel differences as an image comparison methodology [3].

As our system is online, and therefore must make immediate estimates of possibly early or ongoing crop stress, retrospective models do not satisfy our needs. Therefore, we instead focus on adapting these methods to online settings, exploring methodologies using pairwise image differences as a sufficient metric.

3.2 Model-Free Markov Models

Monte Carlo Exploring Starts for POMDPs (MCES-P) [13] extends MCES for MDPs [18], an online implementation of reinforcement learning that explores locally neighboring policies. Algorithm 1 describes the MCES-P process.

where $\tau$ is an observation, action, and reward $\{(o_{i},a_{i},r_{i})\}_{i=0}^{|\tau|}$ .

MCES-P performs round robin transformations over observation sequences and actions via sample action approximation (SAA) [9]. After each transformation, MCES-P compares the current transformed policy against the current best policy and, if it dominates, accepts the transformation as the current best policy.

After each $\tau$ is generated, MCES-P performs a Q-learning update to the current value of the observation sequence-action pair with a depreciating learning rate, $\alpha(c)=1/(c+1)$ [22]. When comparing each policy against the current best policy, MCES-P ensures that both policies have been sampled at least $k$ times. To accomplish this, $\epsilon$ is simply defined as $\epsilon(k,p,q)=+\infty$ if $i<k$ or $j<k$ , and [math] otherwise.

4 Problem Domain

In this work, we propose a novel aggregate framework for utilizing multispectral images and environmental metadata in heterogeneous teams of sensor modalities. In our application, we will a probability of the presence of disease, nitrogen or water stress, including identifying markers that indicate onset of these stresses. The resultant framework is called the Agricultural Distributed Decision framework (ADDF). ADDF tackles individual and team learning and planning, as well as approaches for accomplishing individual tasks, representing our task of organizing satellites, autonomous uninhabited aerial vehicles (AUAVs), autonomous uninhabited ground vehicles (AUGVs), and ground-level environmental sensors, specified in Fig. 2.

Beyond sensor modalities, the environmental domain (used in Sec. 6.1 and simulated in Sec. 6.2) is embodied by large-scale peanut crop fields in Tifton, GA, managed by the Department of Agriculture at the University of Georgia, Tifton campus. Extant stresses include crop field erosion, damage done by local fauna, and several introduced stresses including fungus and pathogen introductions resulting in crop blight and lesions.

The functional capabilities of our sensor modalities are defined as follows:

•

L3: The highest level will be comprised of satellites which collect images of the crops on average every week. The Sentinel 2a and 2b satellites will be used for this layer of information. The data from these satellites are available for free and the ESA also provides a tool box for processing data collected. This level is passive, as it is not directly controlled, instead limited by a fixed temporal component of passing over the same location once every 5 days. Image data in this layer has a multi-spectral resolution of 10 meters per pixel and a thermal resolution of 20 m per pixel.

•

L2: Layer 2 represents the multispectral, high dimensional image data taken by AUAVs. While AUAVs have a very high degree of control and speed of execution with limited challenge in pathing, images are taken above their targets. Images generally have a resolution of 1-3 centimeters per pixel depending on the height of the AUAV and the spatial resolution of the camera.

•

L1: Like L2, this layer represents a controllable physical subsystem, but is executed by AUGVs. Pathing is significantly more challenging and, even given high resolution geolocation sensors, may require exploration for the AUGV to reach its target. Though targeted exploration is more computationally costly and slower, image data can be collected from a variety of perspectives (individual leaves, fruit, flowers, whole plants) and have a similar resolution to L2, albeit from alternative angles.

•

L0: Although often not directly measuring features due to the onset of stress, we include a layer representing an array of ground sensors which measure air and soil moisture, ground temperature, and relative humidity. This helps acquire information valuable for predictive learning approaches that enhance the data model beyond the other layers’ image data, enhancing model accuracy by capturing features that the resultant stresses are conditionally dependent on.

As Fig. 3 illustrates, layers 3, 2, and 1 collect image data at various resolutions and spectra of the crop field, while also collecting environmental data from the layer 0 sensors. Each layer is able to instantly communicate with a centralized server via a wireless network infrastructure. This centralized server then performs analysis and reasoning on the collected image and environmental data, described in the next section.

5 Agricultural Distributed Decision Framework

The ADDF is comprised of two distinct functional capabilities: the processing of NDVI image data to compute the evolution of point-in-time crop growth metrics and the ability to learn when to raise a call-to-action, believing that processed image data indicates immediate or impending crop stress. The former capability requires collecting NDVI image data, computing a moving average pixel sufficient metric, and identifying deviating segments of crop fields. The latter capability introduces a reinforcement learning technique for exploring policies that, based on continuous-value observations, trigger a call-to-action.

5.1 Processing Image Data

As introduced in Sec. 3.1, composing metrics of growth patterns via NDVI imaging is a popular methodology. Unfortunately, much of the contemporary body of work leverages retrospective curve-fitting as a departure point. As our domain requires that immediate action be taken (i.e. at various points during the growing season), we instead opt to develop a sufficient statistic computed from two or more NDVI images taken at potentially diverse intervals by proportionally merging error values.

Two distinct types of image processing using this methodology are required. The first task is simple: two aligned NDVI images, taken from highly similar perspectives (L3 and some L2 tasks), are compared via image difference with $n$ -square-pixel approximation. Due to possible slight deviations of the images (due to erroneous alignment or evolution of crop size), the approximation, which takes the average index of an $p\times p$ square, can be used to alleviate information loss.

The second, more complex methodology required is analyzing images of crops from dramatically different angles, comprising some of L2’s tasks and nearly all of L1’s. In this case, we instead create a generate a distribution of the NDVI image coloration of a crop and compare the distribution to average coloration taken by the layer. In this work, we focus on the first task and the theory of composing a heterogeneous team, leaving this task and online experiments for future work.

Figure 4 shows a trivial implementation of a single pairwise comparison of two images taken 3 weeks apart of our domain’s peanut field in the late 2017 growing season. Interestingly, the results are quite telling. Known to the field workers, three standout features are identifiable via this comparison: (1) the field is bisected by two lines as it was originally 4 fields, (2) near the bisection point significant crop erosion is occurring, and (3) the west, and particularly northwest, sector of the field is beset by damage from grazing deer.

Algorithm 2 annotates the relatively simplistic process of generating a difference sample between two NDVI images. The resultant matrix approximates the index across a $p$ square pixels, the set of which becomes a list of pairwise comparisons. We are interested in $p$ -size sectors that, over the growing season, have very low variance, as it indicates that either (a) crops are not growing or are dying quickly, or (b), at the near infrared portion of the spectrum, low reflectance indicates low crop health from water stress.

As we are primarily interested in disproportionate changes in reflectance and growth, instead of examining pure average indexes across the pairwise differences of images, we focus on the variance of those images. Following the matrix generation in Alg. 2, Alg. 3 computes the variance across all pairwise differences and then normalizes relative to the highest variance. When analyzing the resultant variance matrix, lower values are more concerning than higher ones.

As a last step, since the resultant matrix serves as rudimentary image data where lower variance is represented as higher intensity pixels. In order to combat variance diffuseness, which is caused by crops separated by rows of soil, we crop the image to the field and apply a Gaussian blur where $\sigma=2.5$ . We then segment the image via straightforward applications of image segmentation algorithms, such as K-means [20, 4], to create afflicted sectors of the crop field. These sectors, based on their average variance, are then decision moments for the agents described in the following sections.

5.2 Composing a Heterogeneous Team

As introduced in Sec. 3, we have 4 layers of sensor modalities, each with the capability to take either image or environmental data at varying resolutions. We are tasked with representing these layers, then, as a team, with the common goal of identifying a stress while balancing (a) speed of identification and (b) accuracy of the eventual categorization.

Casting this problem as decision-theoretic is rather straightforward. Each layer is solving an independent game formulation, though the state of the environment is (largely) identical for each of them. Since the eventual categorization is used to inform the decisions of higher level layers, we adopt the perspective of reinforcement learning. Borrowing from Perkins’ MCES-P, we explore a set of policies mapping call-to-actions to real-valued observations of the variance across images of a crop field. Figure 5 presents a visual representation of the actions of layers and the propagation of feedback.

We first present the general POMDP frame for layers 3, 2, and 1. The problem an individual agent faces is defined as a tuple $ADDF_{i}^{L}=\langle S,A,T,\Omega,O,R\rangle$ , where $L$ and $i$ refers to the layer and agent respectively. $A^{L}$ refers to actions taken at this layer, and $A^{L-1}$ refers to the eventual action of the subsequent layer. Level 0 has no agency, represented in our framework as additional state observation information.

•

$S$ : the distribution of stress and agent location over a multi-row and column large-scale crop field

•

$A$ : the set of actions, uniquely defined per layer

–

L3: take low-resolution images of the entire field on rare occasions

–

L2: move; take image of a field or section

–

L1: move; take image of an individual plants, leaves and fruit

•

$T=S\times A\times S$ : state transition function dependent on agent movement

•

$\Omega$ : the set of observations. Each agent receives information from level 0 and information as to how each sector/crop deviates from expected image data using Alg. 3. We discretize to levels of severity via clustering.

•

$O=S\times A$ : the observation function, mapping observations to actions dependent on the state

•

$R=S\times A^{L}\times(A^{L-1})\rightarrow\mathcal{R}$ : the reward function, dependent on the agents action and the decision made by the subsequent layer. The exception is at layer 1, which makes the final decision on the presence of a stress.

Examining the reward function $R$ raises an interesting caveat of our domain: reinforcements must be delayed due to relying on subsequent layer categorization, and games are played in parallel even within each layer. Since each game is a single horizon, and the policies that are learned are memory-less and reactive, this only means that games may be resolved outside the order they were played in. Figure 6 demonstrates how a layer 3 agent may begin playing a game before a previous decision was reinforced.

Here we present the algorithms for ADDF. We generally require two flavors of ADDF: a high level agent that may tackle several sectors at once (such as for layer 3 and, sometimes, layer 2) and a lower level, sector- and crop-specific agent (layer 2 and 1). The high level agent creates multiple decision points for the lower agent, which often must tackle the sectors one-by-one.

We begin by covering the high-level rotation of actions as a crop season progresses, defined in Alg. 4. Line 1 initializes the Q-table, counts, and policies for each layer. Line 6 runs the call-to-action generation for Layer 3, leveraging its current learned policy. Layer 2 is a bit different. While line 7 also generates calls to action, any sector it considers without stress is served as stimuli for Layer 3. Layer 1 only generates stimuli. Line 10 then updates all Q-values and transforms layer policies that have a new best action using reward stimuli.

As an important point, we omit the constraint of parallelism in the description of Alg. 4 for the purpose of brevity and clarity. Line 7, for example, doesn’t occur every iteration. This is accomplished instead by creating a queue from line 6 and iteratively executing line 7 until the queue is empty, illustrated in Fig. 6. Line 10 only executes when Layer 1 or 2 completes a task.

Algorithm 5 defines the L3 policy exploration and execution process. As in Alg. 1, we select a random observation-action pair to explore, creating transformed policy $\pi^{\prime}$ . By taking variance samples of target sectors, L3 generates observations, which it then acts on using the transformed policies. For those actions that indicate a stress, L3 returns a call-to-action.

Since information must be passed between layers, we define $\tau$ differently than canonical MCESP. Here, each element in $\tau^{Li}$ includes the observation, action, and sector index ( $o,a,s$ ) of that layer, omitting rewards. When we return rewards back to preceding layers via $\vec{r}$ , the preceding layer then knows which observation and action to update.

The algorithm for Layer 2 is omitted, as it differs from Layer 3 only in that it returns trajectories from $\vec{\tau}^{L3}$ as a negative reward if it fails to detect a stress in that sector, similar to Layer 1’s line 5 below, albeit only negative rewards.

In our formulation, Layer 1 is considered objective in its classification, and generates stimuli for the other two (or more) layers. Like the previous layers, it generates an image of a particular crop, but if it detects significant non-uniformity, it immediately classifies the sector as stressed.

Each iteration is concluded with an update of the Q-table. As in MCESP, if the sample complexity $k$ is satisfied, then transformations are possible, if the agent has learned enough about each potential policy.

Our domain differs from predominant exploration of POMDP domains in that the observation function is continuous. That is, we receive real-valued differentials between expected crop growth when processing image data. However, via utilizing lossless conversion to observation space partitions [7], we can solve this continuous problem discretely, even considering layer 0 continuous-value environmental metadata.

In layers that contain multiple agents (excluding layer 3 and 0), since the problem requires potentially reacting to multiple call to actions, we will utilize a generalization of the POMDP that solves for instant-communication team play, the Multiagent POMDP [2]. The MPOMDP is an interesting formulation, representing the dynamism of the decision problem by individually capturing the characteristics of each component of the system (in this case, the layers). However, it is well-known that an MPOMDP can be directly converted to an equally expressive POMDP and solved as such [1]. The value is in its interpretability, which doesn’t affect the complexity to solve it.

We represent a layer’s homogeneous team as a tuple $ADDF^{L}=<Z,S,\vec{A},T,\Omega,O,R>$ , with new or modified frame elements:

•

$Z$ : the set of agents

•

$\vec{A}={A^{L}_{z}}_{z=1}^{Z}$ : the set of all agents’ actions

•

$O=S\times\vec{A}$ : the team observation function, where the state and the team’s actions result in a single, shared observation

Utilizing our reinforcement learning perspective, we then learn the optimal policy for each layer as a Monte Carlo solution to the (M)POMDPs [19, 13].

5.3 Tackling the Hyper-Conservative Local Optima

One complication of our methodology is that layers are incentivized to be highly conservative in their call-to-actions. Since the presence of a stress is a rare event vis a vis normal growth patterns, learning is inherently biased to rejecting the stress [6]. Besides dramatically overweighting rewards that indicate stress, we propose two options: forced exploration even under rejection and random exploration in layer 1.

The first option considers reserving a few call-to-actions at each interval (demonstrated in Fig. 6) to create an objective for subsequent layers to explore near-accepted call-to-actions. For example, even if layer 3 rejects the notion of a stress in a particular sector, it will still inform layer 2 of a stress in that sector (along with other positively-identified stresses), a strategy similar to random policy exploration [24]. This may be added to Alg. 5 by adding rejected sectors to the call-to-actions $\tau$ with exponentially-decaying weight $w(\tau,m,i)=\frac{m}{m+|\tau|+i}$ for each $i$ th rejected sector where $m$ controls the steepness of the exponential decay. This methodology is explored in Sec. 6.

The second option capitalizes on potential dead-time in the lower layers. When not exploring the crop field, we can opt to set layer 1 and 2 to an exploration mode, where they peruse the field randomly. When a stress is identified, a call-to-action may be simulated sequentially through the entire system and immediately rewarded appropriately, creating a positive sample to learn from. We reserve exploration of this methodology for real-world experiments in future work.

6 Experimental Results

We propose two toy implementations of our problem domains to test the approaches in Secs. 5.1 and 5.2. We first examine the performance of our image segmentation technique for identifying stressed crop areas by utilizing actual AUAV images collected across several weeks of a growing season. We then introduce a two-layer POMDP implementation of our ADDF framework with a simulated toy environment, showing the effectiveness of learning a reward function vis a vis a subsequent layer’s decisions. In practice, Sec. 6.1 produces inputs for Sec. 6.2, but we perform separate experiments in this work.

6.1 Identifying Phenotypic Stress

Through technology supplied to the Department of Entomology at the University of Georgia, Tifton campus, we collected 5 images, each separated by 7 to 9 days, in July and August of 2017 of a peanut field. These images were collected by a 3DR Solo aerial drone indexed using NDVI at a resolution of 1-3 centimeters per pixel. The resultant files were encoded as Tagged Image File Format (TIFF), sized around 20 megabytes each.

Since these images took place in a previous growing season, the department is aware of several stresses that occurred during the growing season. First, the peanut field used to be four separate fields, bisected by two roads, and a sector just north of the east-west road suffered from stress due to soil erosion.

We experimented with several parameter settings for $p$ , $\sigma$ , and, in the case of K-means, $k$ . Though we hand-selected parameters for the final result in this section, we hypothesize that performing gradient exploration methodologies, parameters could be fit by optimizing for fit in the final segmentation technique (such as the elbow method for K-means [10].

For Alg. 2, we set $p=12$ , converting original image dimensions of around $4900x4200$ to $620x530$ . Figure 7 shows the effect of approximation, resulting in significantly less diffuseness between crop rows, though some is retained. Since the result of Gaussian blurring can lose too much information to compare between images, we wait until the final comparative image before applying it. Next, we compared the differences between two subsequent crop field images.

In Fig. 8 we see the result of the first $diff$ performed in our sample set for the first two dates. It confirms what we know about this data set: the north side of the field (which is darker, indicating less change) is performing significantly worse than the rest of the field, and roads bisecting the field are clearly visible.

Examining the images in Fig. 9 corroborates the growing concern from $diff$ s computed on July 13th and 21st that the middle mass of the crop field is not producing. One drawback of this methodology is that the stress is certainly visible by the $diff$ on July 13th (recovering slightly by the 21st). After applying Alg. 3, since the variance image is still too diffuse for effective image segmentation, we apply a Gaussian blur and then segment, demonstrated in Fig. 10.

Completing the approximation and segmenting technique then results in highly defined and contoured areas, show in Fig. 10 $.c$ for July 21st. 3 distinct layers are clearly visible (though the number of actual layers are determined by the input $k$ for the K-means method, $10$ in our experiments). The center road and previously mentioned erosion due north of the center of mass are both illustrated as very severe, with less severe sectors in the south and northeast.

We complete this section by illustrating the K-means processing for the final two dates in our test set. Over the course of the first month and a half, we see that growth is stymied after our first completion and the problem worsens in the north side of the field, though it improves in the south. By the last date, August 11th, the field is significantly healthier, though the most severe problem areas from July 13th persist.

6.2 Solving Inter-Team Objectives

We test the effectiveness of the ADDF algorithm in a toy implementation of the crop field problem 111Code available on github at https://github.com/quinngroup/addf$\_$pomdp. In our experiment, two agents form a two-layer team, where the first agent (referred to as the ”fast” agent) takes a low-precision image of an entire crop field, which contains 5 sectors, once every 3 days. The second, ”slow” agent can collect image data only for a single sector, but can do it once a day. To simulate a L1 classifier, we use an oracle that returns the true state of the sector after each agent acts on a sector. Each agent receives one of $|O|$ observations, each correlating to a confidence of the stress, from low to high. We illustrate the domain in Fig. 12, where the fast and slow agents are represented by a satellite and AUGV, respectively.

The simulator generates a stress in each sector at the beginning of a growing season, which has $89$ days, with a $50\%$ probability. Early in the season, the stress of the state is unstable, having a $50\%$ chance to change status. This likelihood decreases exponentially each subsequent day. The fast agent gets the maximally correlated observation of the true stress of the environment (either $o=0$ or $o=|O|$ for the lack or presence of a stress, respectively) with $70\%$ probability, while the slow agent receives it with $85\%$ probability. Incorrect observations are received with the remaining probability with exponential weight towards the correct classification. For example, the slow agent’s observations are received with probability $\{o0=0.8,o1=0.1,o2=0.05\}$ when no stress is present.

We test ADDF with configurations varying the number of observations and the inclusion of the heuristic in Sec. 5.3, which adds rejected sectors to call-to-actions with exponentially decaying probabilities. Each configuration is tested with 500 seasons. We begin by setting the baseline with a canonical Q-learning technique, which does not perform policy exploration and always exploits the current highest action-value for an observation.

As noted in Sec. 5.3, the relative rarity of positive stimuli for negative events causes Q-learning to quickly converge to always rejecting the presence of a stress in a sector. Therefore, Q-learning performs essentially the same as the random baseline. With almost no samples to learn from, we strike the essentially random accuracy of the slow agent. We then test ADDF in this domain with trajectory limit $k=500$ .

ADDF performs dramatically better than the Q-learning baseline, achieving over $80\%$ accuracy for the slow agent and 3-obervation fast agent, performing relatively worse with more observations for the fast agent. The fast agent performs comparatively worse than the slow agent likely due to the increased noise in its observation function. These numbers are quite close to the theoretical maximum, considering noise for the environment.

Both the baseline and ADDF can potentially benefit from increased workload. The baseline rejects nearly every sector, and, referring to Tbl. 2, it is clear the slow agent is not given enough decision points. For example, when $|O|=3$ , the fast agent generates just under 60 calls to the slow agent per season, when it can work 90. Without lowest-level agents working every day, many stressed crops will not be identified. We show the results for both the baseline and MCESP with the workload heuristic in Sec. 5.3 with $m=5$ .

Table 3 shows a dramatic performance boost for the baseline but, much more importantly, additionally demonstrates the slow agent working nearly every single available day. While ADDF still outperforms the baseline, it does not show remarkable improvement over the non-heuristic version. However, this isn’t the major contribution of the heuristic. The impact is that, since the slow agent works more days, it identifies up to $83\%$ more stressed crops.

7 Future Work

Several avenues exist for expanding the prototypical implementation we presented in this work. While we simulate layer 0 information in Sec. 6.2, we do not explore the ramifications it may have on image processing in Sec. 6.1. Additionally, layer 1 image processing is a completely different, and far more complex, task than the algorithms presented in Sec. 5.1. However, even in real-world experiments, the tasks accomplished by layer 1 may be readily replicated by a plant pathologist.

Image processing may additionally be improved, beyond layer 1, via aligning images via homography. In some recent work, UAV image data may employ homography computation to correct for small alterations, relieving some of the pressure on our approximation technique in Sec. 5.1 [5].

Further expanding on future work in Sec. 5.1, there is much possibility for more profound image segmentation techniques that can immediately be leveraged by the ADDF framework. Primarily, performing image segmentation to identify and align subsections across the temporal evolution of the system, performing RMSD on subimages, is a potentially fruitful path, explored recently via deep neural networks [17] as a learned classifier. This would be particularly valuable as a replacement for the proposed layer 1 methodology.

Bibliography25

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] C. Amato and F. A. Oliehoek. Scalable planning and learning for multiagent pomdps. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI-15)(to appear) , 2015.
2[2] C. Boutilier. Planning, learning and coordination in multiagent decision processes. In Proceedings of the 6th conference on Theoretical aspects of rationality and knowledge , pages 195–210. Morgan Kaufmann Publishers Inc., 1996.
3[3] A. V. Egorov, D. P. Roy, H. K. Zhang, M. C. Hansen, and A. Kommareddy. Demonstration of percent tree cover mapping using landsat analysis ready data (ard) and sensitivity with respect to landsat ard processing level. Remote Sensing , 10(2):209, 2018.
4[4] R. Gray and Y. Linde. Vector quantizers and predictive quantizers for gauss-markov sources. IEEE Transactions on Communications , 30(2):381–389, 1982.
5[5] T. Guo, T. Kujirai, and T. Watanabe. Mapping crop status from an unmanned aerial vehicle for precision agriculture applications. ISPRS-International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences , pages 485–490, 2012.
6[6] R. Hertwig, G. Barron, E. U. Weber, and I. Erev. Decisions from experience and the effect of rare events in risky choice. Psychological science , 15(8):534–539, 2004.
7[7] J. Hoey and P. Poupart. Solving pomdps with continuous or large discrete observation spaces. In IJCAI , pages 1332–1338, 2005.
8[8] S. Khanal, J. Fulton, and S. Shearer. An overview of current and potential applications of thermal remote sensing in precision agriculture. Computers and Electronics in Agriculture , 139:22–32, 2017.