A real-time remote surveillance system for fruit flies of economic importance: sensitivity and image analysis

Timely detection of an invasion event, or a pest outbreak, is an extremely challenging operation of major importance for implementing management action toward eradication and/or containment. Fruit flies—FF—(Diptera: Tephritidae) comprise important invasive and quarantine species that threaten the world fruit and vegetables production. The current manuscript introduces a recently developed McPhail-type electronic trap (e-trap) and provides data on its field performance to surveil three major invasive FF (Ceratitis capitata, Bactrocera dorsalis and B. zonata). Using FF male lures, the e-trap attracts the flies and retains them on a sticky surface placed in the internal part of the trap. The e-trap captures frames of the trapped adults and automatically uploads the images to the remote server for identification conducted on a novel algorithm involving deep learning. Both the e-trap and the developed code were tested in the field in Greece, Austria, Italy, South Africa and Israel. The FF classification code was initially trained using a machine-learning algorithm and FF images derived from laboratory colonies of two of the species (C. capitata and B. zonata). Field tests were then conducted to investigate the electronic, communication and attractive performance of the e-trap, and the model accuracy to classify FFs. Our results demonstrated a relatively good communication, electronic performance and trapping efficacy of the e-trap. The classification model provided average precision results (93–95%) for the three target FFs from images uploaded remotely from e-traps deployed in field conditions. The developed and field tested e-trap system complies with the suggested attributes required for an advanced camera-based smart-trap.

The fruit fly trap and optical sensor

The e-trap is based on the conventional McPhail trap, developed and used by M. McPhail in 1935 to monitor the Mexican fruit fly (Anastrepha ludens) in Mexico (Steykal 1977). Since then, this trap and several variations has become a conventional trap to monitor FF throughout the world. The McPhail e-trap developed in this study (Fig. 1) comprises 3-pieces of plastic units: (a) a central cylinder (13.5 cm in diameter), bearing an invagination opening to the external environment in the bottom of the cylinder (6 cm in diameter), (b) a lid closing the cylinder from the top, and (c) a battery’s box (with capacity for 6-rechargeable and replaceable lithium batteries) that attach to the lateral external-wall of the cylinder (Fig. S1). The central cylinder is internally divided into several lateral chambers that accommodate the electronics and camera, and a central chamber where entering FF are directed (by the color) onto a yellow sticky-board where they adhere and die (Figs. S2 and S3). The yellow-sticky board faces a high-resolution camera (Raspberry pi Camera Module V2, Okdo Technology LTD) that is activated by a Raspberry pi Zero v 1.3 Microcomputer (Raspberry Pi Limited), which acquires the image and handles the uploading of the image to the cloud or server using cellular communication with a 4G USB Dongle (any modem functioning with the local communication system). The location of the yellow sticky-board and the camera within the trap create an optimal geometry and “focus” that covers the entire sticky-board surface and produces high-resolution images of the adhered insects (Fig. S4). Fruit flies are attracted to the trap using male lures (Methyl Eugenol, ME, for Bactrocera males, Trimedlure, TML, for Ceratitis males,) and protein-based products, such as Biolure, for female flies. At this stage, the e-trap is activated twice a day, sending two images per day. Energy is obtained from six lithium batteries (NCR18650PF) that allows the e-trap to function uninterruptedly for around 6–7 months. Uploaded images are processed by image analysis algorithms developed specifically for this purpose (see following sections). The results are managed and fed into risk models and/or sending alerts to stakeholders (Fig. 2).

Field-test of e-trap’s fruit fly attraction, electronic functioning and communication

The e-traps were tested, and their performance was compared with conventional traps, in five countries (Israel, Greece, Austria, Italy and South Africa). Investigation and comparison of the e-trap attractiveness was conducted targeting males of B. dorsalis (South Africa, where the species is established in the North parts of the country, and Italy, where recent interceptions were reported in 2018), B. zonata (Israel, where it is present and contained in urban and suburban Tel Aviv), and C. capitata (Israel and Greece, where it is endemic, and Austria where interceptions of the fly are common) using methyl eugenol (ME) dispensers for the first two species and Trimedlure (TML) dispensers for the third species. Traps were shipped from the Agricultural Research Organization (ARO) in Israel to partners and field-deployed for several periods in different ecosystems (Table S1). E-traps were paired with conventional traps commonly used in each country to monitor these flies. An expert entomologist serviced conventional traps with a periodicity of once a week up to once every 3 weeks (depending on location, season and distance). Also, an expert entomologist daily-inspected digital images uploaded by the e-traps to a google drive server. Data collected from individual e-traps included: daily upload of images (i.e., communication) and count of attracted FF (i.e., attractiveness). Data from paired conventional traps included the period of inspection and amount of FF attracted and trapped. These data were used to contrast trapping performance between the e-traps and conventional traps (paired t-test). Comparison between e-traps and conventional traps was made by counting the number of flies trapped in the e-traps during the same period of the conventional traps, which was framed by the service of the conventional trap. Thus, each e-trap was reset each time that the scout visited the paired conventional trap. The statistics only included periods with positive captures in any of the two trap. In addition, frequency of events in which information on the capture of the target fly by the e-trap preceded (even by one day) the information provided by the scout at the end of the visiting period was derived. In this case, we also only used periods with positive captures of target FF. This provided information on the early warning abilities of the e-trap in contrast to the current methods of servicing non-automatic traps by scouts.

Training a deep learning detector and classifier of Bactrocera and Ceratitis fruit flies

Source of data

Images to train the detector and classifier were derived from two sources: the “FF-photographic studio” (FF-studio) and from actual field images described in the previous section. The FF-studio consisted of an e-trap body with the ability to manually activate the camera and take at any time a picture of the yellow sticky-board (populated with different settings of laboratory FF). Adhesion of FF to the yellow sticky-boards was facilitated by using a laboratory setup exposing the e-trap insert (bearing the yellow sticky-board) to living laboratory FF (B. zonata and C. capitata) within a 40 × 40 cm Perspex cage (Fig. S5). These mock-ups were then inserted in the FF-studio system (Fig. S6), and images were produced under different illuminations resulting from placing and manually activating the FF-studio system outdoors under different tree canopies, and during different hours of the day. Pictures with diverse quantities of FF derived from the artificial exposure of the yellow sticky board to flies in the laboratory cage, and different illuminations, were produced and uploaded automatically by the FF-studio to the cloud. Several hundred of images were obtained, simulating the actual conditions of an active e-trap with the two target FF species. Data from the field were derived from the e-traps deployed and tested in the field as described previously. Several hundreds of images were obtained from this field-test. Images from the field contained the three FF species, and other insects attracted and adhered to the yellow-sticky board (ants, lacewings, bees, etc.). Bactrocera dorsalis images were obtained from traps deployed in citrus orchards in South Africa, and served to train the classifier (no images of B. dorsalis were artificially obtained with specimens from the laboratory). Images from all the insects obtained by these two methods were manually classified and annotated.

Data labeling and augmentation

The images acquired in the FF-studio contained mainly specimens that included a single FF species, either C. capitata or B. zonata. This enabled labeling each image and use that label for every insect detected in the image. To propagate labels from image to insects within an image, a class-agnostic detector was developed to extract image patches that contain insects, regardless of their species. Images were collected under controlled conditions, with a significant contrast between the background and the objects. We exploit this contrast and apply the “Canny” operator to detect insects in the image (Canny 1986). Canny is a classical image-processing algorithm for object detection that involves three main stages: (i) applying image smoothing for noise removal, (ii) compute the changes in x–y directions of the image (i.e., gradient) using Sobel/Prewitt/Roberts filters (Bhardwaj and Mittal 2012), and (iii) selecting gradients within a specific range determined by the user. The hyper parameters of the Canny algorithm were tuned to reach a good detection accuracy. The automatic annotation process is summarized in Fig. 3.

Annotating images collected using the field-test described earlier was more challenging because insects found in each e-trap image included diverse insect species, besides the target FF. No other FF besides the target one was trapped. Each insect was annotated using the SuperAnnotate^© tool, which allows to define manually the locations of objects in the image by delimiting rectangles enclosing those objects (bounding boxes) (https://www.superannotate.com). The first step consisted on bounding boxes for each individual insect in an image. Objects were labeled with one of 11 possible classes: the three species of FF, B. zonata (PEACH-FF), C. capitata (MEDFLY) and B. dorsalis (ORIENTAL-FF), and other non-Tephritid attracted insects such as house flies, lacewings, bees and ants. No other FF were detected in any of the e-traps and conventional traps. Beside the above classes, we used an extra class, “other,” for less common species encountered in the e-trap.

To enrich the dataset, a standard data augmentation method was applied. Data augmentation enriches the data by adding slightly modified copies of existing data or creating new synthetic data from existing data. It improves classification accuracy by significantly increasing data diversity without collecting new physical samples (Shorten and Khoshgoftaar 2019). This strategy is known to contribute to generalization and prevent overfitting (Shorten and Khoshgoftaar 2019). Four augmentation operators were used: (1) horizontal flipping of the input image with probability 0.5 (the augmentation was applied to 50% of the images); (2) vertical flipping with probability 0.5; (3) rotating the input image with an angle between 0 and 90 degrees, with probability of 0.3; and (4) randomly changing brightness and contrast of the input image with probability of 0.2.

Standard data partitioning was used into three sets—train, validation and test sets, with a ratio of 0.6/0.2/0.2, respectively. Train set is the actual set used to train the model, validation set is used to tune the hyper parameters based on the performance of the model on this data split, and test set is used to provide a final evaluation of the trained model fit over unseen data. To evaluate model performance, we randomly split the data to train/validation/test (k-fold cross-validation) five times to explore the generalization ability of the trained model (Stone 1974). We fitted the model using each group training set and evaluated performance by using each test set. We report the average result over five test sets.

Training and deep learning

A Faster R-CNN ResNet50 model (Ren et al. 2015) was trained. Faster R-CNN is a common deep convolutional network used for object detection, which has shown to be an accurate predictor of different objects’ locations. First, a convolutional mask scans the image and generates candidate bounding boxes. Then, a fixed length feature vector is extracted. Finally, a simple neural network predicts the object class and bounding box coordinates. The model was initialized with a ResNet50 backbone pre-trained on ImageNet (a benchmark dataset for image classification and object detection), and fine-tuned the model with the FF-studio dataset.

A second-deep detection model was trained using the field dataset. Specifically, we trained the same architecture of the Faster R-CNN model that was used for the FF-studio data, for 200 epochs (i.e., the number of iterations over the entire training dataset that the machine learning algorithm performs). We optimized the model with stochastic gradient descent (Bottou 2010). We tuned model hyper parameters using the validation set. Specifically, we searched over learning rates in [$5{e}^{-4}$, $1{e}^{-3}$, $2{e}^{-3}$, $5{e}^{-3 },1{e}^{-2 }, 2{e}^{-2 }, 5{e}^{-2 }]$, and batch size in [1, 2, 3, 5]. Best validation loss was obtained with learning rate = $2{e}^{-2}$, momentum = 0.9 and a batch size of 1.We used the standard loss functions of Faster R-CNN: a cross-entropy loss for classification, and ${L}_{2}$ loss for bounding box regression. Results were evaluated using standard metrics for object detection. Specifically, we computed the average precision per class and true positive rate using these basic definitions:

$${\text{Accuracy}} = \frac{{\sum {\left( {{\text{label}}\left( {{\text{Box}}_{{{\text{pred}}}} } \right) = {\text{label}}\left( {{\text{Box}}_{{{\text{gt}}}} } \right)} \right)} }}{{\sum {{\text{Box}}_{{{\text{pred}}}} } }}$$

(1)

$${\text{Precision}} = \frac{{\sum {{\text{TP}}} }}{{\sum {{\text{TP}}} + \sum {{\text{FP}}} }}$$

(2)

$${\text{Recall}} = \frac{{\sum {{\text{TP}}} }}{{\sum {{\text{TP}}} + \sum {{\text{FN}}} }}$$

(3)

$${\text{IoU}} = \frac{{{\text{area}}\left( {B_{{{\text{pred}}}} \cap B_{{{\text{gt}}}} } \right) }}{{{\text{area}}\left( {B_{{{\text{pred}}}} \cup B_{{{\text{gt}}}} } \right)}}$$

(4)

Here, true positive (TP) is a correct detection (Detection with “Intersection over Union,” IoU ≥ threshold), false positive (FP), is a wrong detection (Detection with IoU < threshold), and false negative (FN) is a case where a ground truth is not detected. True negative (TN) does not apply, since in object detection the data include many possible bounding boxes which should not be detected (i.e., the background) (Padilla et al. 2021). Using these terms, we compute precision and recall values for all classes, using IoU thresholds of 0.5. Then, we constructed a precision–recall curve where each data point in the curve represents precision and recall values for a specific score threshold varying from 0 to 1. We also compute the average precision value (AP) for each class. For the accuracy metric, we first filtered out all predictions with IoU lower than 0.5 with any ground truth box and all predictions with confidence scores lower than 0.5, keeping only valid predictions. Then, we counted the ratio between correct classification and the total number of valid predictions.

Evaluation of the trained model using field data

Validation consisted on deploying e-traps in the field for 30 days during June 2021. Five e-traps with ME as attractant were deployed in suburban areas of Tel Aviv, Israel, to capture male B. zonata flies, and five e-traps were deployed in Northern Israel orchards, loaded with TML (3 e-traps) and Biolure (2 e-traps), to capture C. capitata male and female flies. The use of Biolure was intended as a test of the ability of the trap to capture female C. capitata, and as an attractant for several other non-Tephritidae insects attracted to the bait, which allowed us to test the precision of the code when several other species of insects are present in the image. In addition, we used images generated during the exposure of the ME-baited e-traps in South Africa (Sect. 2.2), but that were not used for the training of the code. The pictures derived from the e-traps in South Africa had images of male B. dorsalis. These images were automatically uploaded to the cloud and processed with the developed deep learning detector and classifier code (see Sect. 2.3). Results obtained from the code were then evaluated by inspecting the uploaded images by an entomologist that contrasted the automatic results generated by the code with the actual images observed by the entomologist. Results of the contrast were summarized using a “confusion matrix” approach (Table S2). The image analysis classification code has been uploaded to a repository: https://github.com/ydiller/insect_detection/tree/public (the main files are train.py and test.py).

Header

Title

Authors

Availability

Better title

Source

URL

Date

Description

Abstract

Keywords

Body