Deep Learning Object Detection During Pepper Harvesting: Comparing YOLO, Faster R-CNN, and SSD with Multiple Feature Extractors
Abstract
Artificial intelligence (AI) methods based on Convolutional Neural Networks (CNNs) have gained great relevance in agricultural applications in recent years. Training CNNs that uses images to localize and categorize multiple relevant objects during harvesting tasks can improve decision-making across various agricultural scenarios. Approaches based on object detection, segmentation, and classification are still being studied with great interest for these applications. Nevertheless, despite the propitious progress achieved in the last decade, other challenges continue to exist, including the selection of the appropriate algorithms, the amount and quality of datasets, and the variance and bias among categories within each dataset distribution. In this work, we consider all the above-mentioned to compare the performance of three different image-based object detection CNNs based on a dataset obtained during pepper harvesting tasks. The detection methods are YOLOv2, Faster R-CNN, and Single Shot MultiBox Detector (SSD). For each detector, we tested different feature extraction models, including GoogLeNet, ResNet-18, and ShuffleNet, to detect various object categories such as humans, boxes, pepper, pepper groups, and pushcarts. We employed a total of 420 images obtained from a Chilean pepper farm during the training process. During testing, we achieved average precision (AP) values up to 99% for the YOLOv2 model, 91% for SSD, and 87% for Faster R-CNN. These results demonstrate that is possible to successfully implement CNNs for the detection of multiple objects during pepper harvesting processes, and that YOLOv2 works particularly well for this application.
Related articles
Related articles are currently not available for this article.