Aerial Animal Biometrics: Individual Friesian Cattle Recovery and Visual   Identification via an Autonomous UAV with Onboard Deep Inference

William Andrew; Colin Greatwood; Tilo Burghardt

arXiv:1907.05310·cs.RO·July 12, 2019

Aerial Animal Biometrics: Individual Friesian Cattle Recovery and Visual Identification via an Autonomous UAV with Onboard Deep Inference

William Andrew, Colin Greatwood, Tilo Burghardt

PDF

TL;DR

This paper presents a novel UAV-based system with onboard deep learning for autonomous identification of individual cattle in open pastures, achieving error-free recognition in real-world tests.

Contribution

It introduces the first UAV platform capable of real-time, onboard biometric identification of individual animals in open environments using multiple deep neural networks.

Findings

01

Error-free identification in real-world field tests

02

Successful autonomous low-altitude flight over dispersed herd

03

Integration of multiple deep learning models onboard UAV

Abstract

This paper describes a computationally-enhanced M100 UAV platform with an onboard deep learning inference system for integrated computer vision and navigation able to autonomously find and visually identify by coat pattern individual Holstein Friesian cattle in freely moving herds. We propose an approach that utilises three deep convolutional neural network architectures running live onboard the aircraft; that is, a YoloV2-based species detector, a dual-stream CNN delivering exploratory agency and an InceptionV3-based biometric LRCN for individual animal identification. We evaluate the performance of each of the components offline, and also online via real-world field tests comprising 146.7 minutes of autonomous low altitude flight in a farm environment over a dispersed herd of 17 heifer dairy cows. We report error-free identification performance on this online experiment. The presented…

Tables2

Table 1. TABLE I: Offline Performance Results.

		Detection on	Single Frame ID	Combined
# Sample	# Animal	Sample Frames	on Labelled RoIs	Detection+ID
Frames	Instances	Accuracy (%)	Accuracy (%)	Accuracy (%)
1039	111	92.4	93.6	91.9

Table 2. TABLE II: Online Identification Results.

	LRCN Identification	Single Frame Identification
# Samples	Accuracy (%)	Accuracy (%)
18	100	94.4

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Aerial Animal Biometrics: Individual Friesian Cattle Recovery and Visual Identification via an Autonomous UAV with Onboard Deep Inference

William Andrew1, Colin Greatwood2 and Tilo Burghardt1 *This work was supported by the EPSRC Centre for Doctoral Training in Future Autonomous and Robotic Systems (FARSCOPE) at the Bristol Robotics Laboratory (BRL).1Department of Computer Science, 2Department of Airspace Engineering, Faculty of Engineering, University of Bristol, United Kingdom (UK).

Abstract

This paper describes a computationally-enhanced M100 UAV platform with an onboard deep learning inference system for integrated computer vision and navigation. The system is able to autonomously find and visually identify by coat pattern individual Holstein Friesian cattle in freely moving herds. We propose an approach that utilises three deep convolutional neural network architectures running live onboard the aircraft: (1) a YOLOv2-based species detector, (2) a dual-stream deep network delivering exploratory agency, and (3) an InceptionV3-based biometric long-term recurrent convolutional network for individual animal identification. We evaluate the performance of each of the components offline, and also online via real-world field tests comprising 147 minutes of autonomous low altitude flight in a farm environment over a dispersed herd of 17 heifer dairy cows. We report error-free identification performance on this online experiment. The presented proof-of-concept system is the first of its kind. It represents a practical step towards autonomous biometric identification of individual animals from the air in open pasture environments for tag-less AI support in farming and ecology.

I Introduction and Related Work

This paper presents an unmanned aerial vehicle (UAV) platform with onboard deep learning inference (see Fig. 1) that autonomously locates and visually identifies individual Holstein Friesian cattle by their uniquely-coloured coats in low altitude flight (approx. 10m) within a geo-fenced farm area. The task encompasses the integrated performance of species detection, exploratory agency, and individual animal identification (ID). All tasks are performed entirely onboard a custom DJI M100 quadrotor with limited computational resources, battery lifetime, and payload size.

In doing so, this work attempts to assist agricultural monitoring in performing minimally-invasive cattle localisation and identification in the field. Possible applications include the behavioural analysis of social hierarchies [1, 2, 3], grazing patterns [4, 5] and herd welfare [6].

The search for targets with unknown locations traditionally arises in search and rescue (SAR) scenarios [7, 8, 9]. In visually-supported navigation for this task, approaches broadly operate either a map-based or map-less paradigm [10, 11]. Map-less approaches have no global environment representation and traditionally operate using template appearance matching [12, 13], optical-flow guidance [14], or landmark feature tracking [15, 16]. More recently, such systems have been replaced with visual input classification via convolutional neural networks (CNNs) [17, 18]. In this work, we build on a simulation setup presented in [19] and formulate a 2D global grid approximation of the environment (see map $M$ in Fig. 1) for storing visited positions, current location, and successful target recoveries. This concept is inspired by occupancy grid maps [20, 21], as opposed to post-exploration maps [22] or topological maps [23]. For our cattle recovery task – and despite their simplicity – grid maps still represent a highly effective tool [19] for exploring the solution space of AI solutions [24, 25].

Coat pattern identification of individual Friesian cattle represents a form of animal biometrics [26]. Early systems for the particular task at hand utilised the Scale-Invariant Feature Transform (SIFT) [27] on image sequences [28] or Affine SIFT (ASIFT) [29] to map from dorsal cow patterns to animal IDs [30]. However, for up-to-date performance we base our individual ID component on recent CNN-grounded biometric work [31] where temporal stacks of the region of interest (RoIs) around detected cattle are analysed by a Long-term Recurrent Convolutional Network (LRCN) [32] as shown in Fig. 1 in red. This architecture represents a compromise between light-weight onboard operation and more high-end networks with heavier computational footprints.

Whilst aerial wildlife census applications routinely use manually controlled UAVs [33, 34, 35, 36], and have experimented with part-automated photographic gliders [37], to the best of our knowledge this paper presents the first proof-of-concept system for fully autonomous exploration and online individual biometric identification of animals onboard an aircraft.

To summarise this paper’s principal contributions:

•

Proof-of-concept in the viability of autonomous aerial biometric cattle ID in a real-world agricultural setting.

•

Novel combination of algorithms performing online target detection, identification and exploratory agency.

•

Validation of the employed UAV hardware setup capable of deep inference onboard the flight platform itself.

•

Real-world and live application of the exploratory simulation framework developed in our previous work [19].

II Hardware

We use the DJI Matrice 100 quadrotor UAV, which houses the ROS-enabled DJI N1 flight controller, as our base platform. It has been employed previously across various autonomous tasks [38, 39, 40, 41]. We extend the base M100 platform by adding an Nvidia Jetson TX2 mounted on a Connect Tech Inc. Orbitty carrier board to enable onboard deep inference via 256 CUDA cores under Nvidia’s PascalTM architecture. Also onboard is the DJI Manifold (essentially an Nvidia Jetson TK1) to decode the raw image feed from the onboard camera (the DJI Zenmuse X3 camera/gimbal) and to add further computational capacity. The X3 camera is mounted on rubber grommets and a 3-axis gimbal, which allows for rotor vibration isolation and independent movement of the flight platform for stabilised footage and programmatically controlled roll-pitch-yaw. A Quanum QM12V5A-UBEC voltage regulation device was fitted to power non-conformal devices feeding off the primary aircraft battery. In addition, customised mounts were added for WiFi antennas to monitor the craft remotely. Figure 2 depicts the complete aircraft with views of custom components, whilst Figure 3 shows a detailed overview of the communication infrastructure. Note that the base station and remote control devices act in a supervisory role only; all inputs and autonomous control are processed and issued onboard the UAV.

III Experimental Setup

III-A * Location, Timing and Target Herd*

Flights were performed over a two-week experimental period at the University of Bristol’s Wyndhurst Farm in Langford Village, UK (see Fig. 4) on a consistent herd of 17 yearling heifer Holstein Friesian cattle (see Fig. 5), satisfying all relevant animal welfare and flight regulations. Experiments consisted of two phases: (a) training data acquisition across 14 (semi)manual flights, and subsequent (b) conduction of 18 autonomous flights.

III-B Training Data, Annotation and Augmentation

Training data was acquired over two day-long recording sessions where manual and semi-autonomous flights at varying altitudes were carried out, recording video at a resolution of $3840\times 2160$ at $30$ fps. The result was a raw dataset consisting of 37 minutes from 15 videos over 14 flights occupying 18GB. Overall $2285$ frames were extracted from these video files at a rate of $1Hz$ .

First, after discarding frames without cattle, $3120$ bounding boxes around individual cattle were labelled in the $553$ frames containing cattle. Animals were also manually identified as ground truth for individual identification. Secondly, to produce ground truth for training the cattle detector, square sub-images (matching the YOLOv2 input tensor size) were manually annotated to encompass individuals such that they are resolved at approximately $150\times 150$ pixels. Figure 6 illustrates the full pipeline. To synthesise additional data, augmentations for both detection and identification datasets are performed stochastically with the possibility for any combination of the operations listed as follows according to a per-operation likelihood value: horizontal & vertical flipping, crop & pad, affine transformations (scale, translate, rotate, shear), Gaussian, average or median blurring, noise addition, background variations, contrast changes, and small perspective transformation. Figure 7 provides augmentation samples.

IV Software, Implementation and Training

IV-A Object Class Detection

Cattle detection and localisation is performed in real-time frame-by-frame using the YOLOv2 [43] CNN. The network was retrained from scratch on the annotated region dataset, consisting of $11,384$ synthetic and non-synthetic training images (see Fig. 7, bottom) and associated ground truth labels. Model inference operates on $736\times 736$ images obtained by cropping and scaling the source $960\times 720$ pixel camera stream. As shown in Figure 1, this process yields a set of $m$ bounding boxes $B=\{b_{0},b_{1},...,b_{m-1}\}$ per frame with associated object confidence scores. Inference on each of $n=5$ sampled frames then produces a box-annotated spatio-temporal volume $\{B_{0},B_{1},...,B_{n-1}\}$ . Bounding boxes are associated across this volume by accumulating detections that are consistently present in cow-sized areas of an equally subdivided image. This method is effective for reliable short-term tracking due to distinct and non-overlapping targets, slow target movement and stable UAV hovering. The outputs are $p\leq n$ short individual animal tracklets reshaped into an image patch sequence $\{T_{0},T_{1},...,T_{p}\}$ , which forms the input to the individual identification network (see Section IV-D). In addition, the current frame $I$ is also abstracted to a $5\times 5$ grid map $S$ encoding animal presence in the field of view of the camera (see Fig. 1). This forms the input to the exploratory agency network discussed in the following section.

IV-B Exploratory Agency $(EA)$

Navigation activities aim at locating as many – themselves moving – individual animals as possible on the shortest routes in a gridded domain where a target counts as ‘located’ once the agent occupies the same grid location as the target. To solve this dynamic travelling salesman task with city locations to be discovered on the fly, we use a dual-stream deep network architecture, as first suggested in our previous work [19]. The method computes grid-based navigational decisions $\{N,W,S,E\}$ based on immediate sensory (tactical/exploitation) and historic navigational (strategic/exploration) information using two separate streams within a single deep inference network. As shown in the paper, this strategy can significantly outperform simple strategies such as a ‘lawnmower’ pattern and other baselines.

To summarise the method’s operation, the sensory input $S$ is processed via a first stream utilising a basic AlexNet [44] design (see Fig. 1). A second stream operates on the exploratory history thus far, as stored in a long-term memory map $M$ (see Fig. 1). This stores the agent’s present and past positions alongside animal encounters within the flight area of $20\times 20$ grid locations. The agent’s starting position is fixed and $M$ is reset after $\delta\%$ of the map has been explored. Both these streams are concatenated into a shallow integration network that, as shown in Figure 1, maps to a SoftMax-normalised likelihood vector $V$ of the possible navigational actions. During inference, the network selects the top-ranking navigational action from $\{N,W,S,E\}$ based on $V$ , which is performed and, in-turn, the positional history $M$ is updated.

For training, the entire two-stream navigation network is optimised via stochastic gradient decent (SGD) with momentum [45] and a fixed learning rate $e=0.001$ based on triples $(S;M;V)$ using one-hot encoding of $V$ and cross-entropy loss. This unified model allows for the back-propagation of navigation decision errors across both streams and domains. For training, we simulate $10,000$ episodes of $17$ pseudo-randomly [46] placed targets in a $20\times 20$ grid and calculate optimal navigation decisions $(S;M;V)$ by solving the associated travelling salesman problem. $10$ -fold cross validation on this setup yielded an accuracy of $72.45\%$ in making an optimal next grid navigation decision and a target recovery rate of $0.26\pm 0.06$ targets per grid move. For full implementation details we refer to the original paper [19], which operates on simulations. In contrast, examples of real-world environment explorations during our 18 test flights are visualised in Figure 8 and Figure 9.

IV-C Coordinate Fulfilment

Re-positioning commands from $\{N,W,S,E\}$ to the M100 flight platform need to be issued via local position offsets in metres with respect to a programatically-set East North Up (ENU) reference frame. As such, in order to fulfil a target GPS coordinate arising from exploratory agency, it must be converted into that frame. This is achieved by converting the target GPS coordinate into the static Earth-Centred Earth-Fixed (ECEF) reference frame, then converting that coordinate into the local ENU frame. Equally, the same process is performed on the agent’s current GPS position and the resulting local positions are compared. Our implementation follows the standard as established in the literature [47, 48].

IV-D Identity Estimation $(IE)$

Individual identification based on an image patch sequence $\{T_{0},T_{1},...,T_{p}\}$ is performed via an LRCN, first introduced by Donahue et al. [32]. In particular, as shown in Figure 1, we combine a GoogLeNet/Inception V3 CNN [49, 42] with a single Long Short-Term Memory (LSTM) [50] layer. This approach has demonstrated success in disambiguating fine-grained categories in our previous work [31].

Training of the GoogLeNet/Inception V3 network takes groups of $n=5$ same class randomly selected RoIs (exemplified in Fig. 7, top), each of which were non-proportionally resized to $224\times 224$ pixels. SGD with momentum [45], a batch size of 32 and a fixed learning rate $e=0.001$ were used for optimisation. Figure 10 (right) provides evidence of per-category learning of appropriate spatial representations using local interpretable model-agnostic explanations [51], which qualitatively highlight the success of the Inception architecture learning discriminative and fine-grained visual features for each individual. Once trained, samples are passed through this GoogLeNet up to the pool_5 layer and feature vectors are combined over the $n$ samples. A shallow LSTM network is finally trained on these vector sequences using a SoftMax cross-entropy cost function optimised against the one-hot encoded identities vector representing the $17$ possible classes. This approach achieved 100% validation accuracy with little training, as can be seen in Figure 10 bottom.

V Real-World Autonomous Performance

We conducted $18$ fully autonomous test flights at a low altitude (approximately $10$ m) above an area of $20\times 20$ cells (see Fig. 4) covering altogether 147 minutes. Examples of environment explorations are visualised in Figure 8 and Figure 9 depicting various example flights with detailed annotations of flight path, animal encounters and identification confidences. For all experiments, we ran the object detection and exploratory agency networks live and in real-time to navigate the aircraft. Note that the herd was naturally dispersed in these experiments and animals were free to roam across the entire field in and out of the covered area. Thus, only a few individuals were present and recoverable in this area at any one time. The median coverage of the grid was $70.13\%$ with median flight time of $8$ minutes and $9$ seconds per experiment, and a median of $77$ grid iterations per flight. For each of the flights, we conducted two types of experiment: that is (a) saving a single frame per grid location (due to onboard storage limitations) visited and perform a full separate analysis of detection and identification performance after the flight offline, and (b) also running multi-frame identification live during flight for all cases where the aircraft has navigated centrally above a detected individual.

V-A Offline After-Flight Performance Evaluation

For offline analysis, the UAV saved to file one acquired $720\times 720$ image at each exploratory agency iteration, yielding $1039$ images. $99$ of those images actually contained target cows that were hand labelled with ground truth bounding box annotations and identities according to the VOC guidelines [53]. A detection was deemed a successful true positive based on the IoU ( $ov\geq 0.5$ ). Grounded in this, we measured the YOLOv2 detection accuracy to be $92.4\%$ , where out of the $111$ present animals, $2$ were missed and $7$ false positive nested detections occurred (see Fig. 11). We then tested separately, the performance of the single frame Inception V3 individual identification architecture (yielding $93.6\%$ accuracy), where all ground truth bounding boxes (not only the detected instances) were presented to the ID component. In contrast, and as shown in Table I, when identification is performed on detected RoIs only then the combined offline system accuracy is 91.9%.

V-B Online In-Flight Performance Evaluation

For online autonomous operation, all computation was performed live in real-time onboard the UAV’s computers (DJI Manifold & Nvidia Jetson TX2). Figure 9 depicts various example flights with detailed annotations of flight paths, animal encounters and identification confidences. Across the $1039$ grid locations visited during the set of experiments, the aircraft navigated centrally above a detected individual $18$ times triggering identification. Note that this mode of operation eliminates the problem of clipped visibility at image borders, minimises image distortions, optimises the viewpoint, and exposes the coat in a canonized orthogonal view. For triggered identification, we store intermediate LRCN confidence outputs after processing up to $5$ same-class patches $T$ to compare performance differences between single view and multi-view identification. Figure 12 depicts some same-class patch sequences $T$ and one instance where multi-frame inference was indeed beneficial to identification. The respective overall results are given in Table II. Notably, across the small online sample set ( $18$ instances), the LRCN model performs perfectly.

VI Conclusion and Future Work

This paper provides a proof-of-concept that fully autonomous aerial animal biometrics is practically feasible. Operating in a real-world agricultural setting, the paper demonstrated that individual cattle identities can be reliably recovered biometrically from the air onboard a fully autonomous robotic agent. Experiments conducted on a small herd of 17 live cattle confirmed demonstrable identification robustness of the proposed approach. In successfully performing these tasks with limited computational resources alongside payload, weight restrictions and more, the presented system gives rise to future agricultural automation possibilities with potential positive implications for animal welfare and farm productivity.

Beyond farming, the concept of autonomous biometric animal identification from the air as presented opens up a realm of future applications in fields such as ecology, where animal identification of uniquely patterned species in the wild (e.g. zebras, giraffes) is critical to assessing the status of populations.

Bibliography53

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] R. Ungerfeld, C. Cajarville, M. Rosas, and J. Repetto, “Time budget differences of high-and low-social rank grazing dairy cows,” New Zealand journal of agricultural research , vol. 57, no. 2, pp. 122–127, 2014.
2[2] S. Kondo and J. Hurnik, “Stabilization of social hierarchy in dairy cows,” Applied Animal Behaviour Science , vol. 27, no. 4, pp. 287–297, 1990.
3[3] C. Phillips and M. Rind, “The effects of social dominance on the production and behavior of grazing dairy cows offered forage supplements,” Journal of Dairy Science , vol. 85, no. 1, pp. 51–59, 2002.
4[4] P. Gregorini, “Diurnal grazing pattern: its physiological basis and strategic management,” Animal Production Science , vol. 52, no. 7, pp. 416–430, 2012.
5[5] P. Gregorini, S. Tamminga, and S. Gunter, “Behavior and daily grazing patterns of cattle,” The Professional Animal Scientist , vol. 22, no. 3, pp. 201–209, 2006.
6[6] B. Sowell, J. Mosley, and J. Bowman, “Social behavior of grazing beef cattle: Implications for management,” Journal of Animal Science , vol. 77, no. E-Suppl, pp. 1–6, 2000.
7[7] L. Lin and M. A. Goodrich, “Uav intelligent path planning for wilderness search and rescue,” in Intelligent robots and systems, 2009. IROS 2009. IEEE/RSJ International Conference on . IEEE, 2009, pp. 709–714.
8[8] S. Waharte and N. Trigoni, “Supporting search and rescue operations with uavs,” in Emerging Security Technologies (EST), 2010 International Conference on . IEEE, 2010, pp. 142–147.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Aerial Animal Biometrics: Individual Friesian Cattle Recovery and Visual Identification via an Autonomous UAV with Onboard Deep Inference

Abstract

I Introduction and Related Work

II Hardware

III Experimental Setup

III-A * Location, Timing and Target Herd*

III-B Training Data, Annotation and Augmentation

IV Software, Implementation and Training

IV-A Object Class Detection

IV-B Exploratory Agency (EA)(EA)(EA)

IV-C Coordinate Fulfilment

IV-D Identity Estimation (IE)(IE)(IE)

V Real-World Autonomous Performance

V-A Offline After-Flight Performance Evaluation

V-B Online In-Flight Performance Evaluation

VI Conclusion and Future Work

IV-B Exploratory Agency $(EA)$

IV-D Identity Estimation $(IE)$