Welcome to Smart Agriculture

Smart Agriculture ›› 2020, Vol. 2 ›› Issue (1): 99-110.doi: 10.12133/j.smartag.2020.2.1.202001-SA004

• Topic--Agricultural Remote Sensing and Phenotyping Information Acquisition Analysis • Previous Articles     Next Articles

Apple detection model based on lightweight anchor-free deep convolutional neural network

Xia Xue1,2, Sun Qixin1,2, Shi Xiao1,2, Chai Xiujuan1,2()   

  1. 1.Agricultural Information Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
    2.Key Laboratory of Agricultural Big Data, Ministry of Agriculture and Rural Affairs, Beijing 100081, China
  • Received:2020-01-21 Revised:2020-02-19 Online:2020-03-30


Intelligent production and robotic oporation are the efficient and sustainable agronomic route to cut down economic and environmental costs and boosting orchard productivity. In the actual scene of the orchard, high performance visual perception system is the premise and key for accurate and reliable operation of the automatic cultivation platform. Most of the existing apple detection models, however, are difficult to be used on the platforms with limited hardware resources in terms of computing power and storage capacity due to too many parameters and large model volume. In order to improve the performance and adaptability of the existing apple detection model under the condition of limited hardware resources, while maintaining detection accuracy, reducing the calculation of the model and the model computing and storage footprint, shorten detection time, this method improved the lightweight MobileNetV3 and combined the object detection network which was based on keypoint prediction (CenterNet) to build a lightweight anchor-free model (M-CenterNet) for apple detection. The proposed model used heatmap to search the center point (keypotint) of the object, and predict whether each pixel was the center point of the apple, and the local offset of the keypoint and object size of the apple were estimated based on the extracted center point without the need for grouping or Non-Maximum Suppression (NMS). In view of its advantages in model volume and speed, improved MobileNetV3 which was equipped with transposed convolutional layers for the better semantic information and location information was used as the backbone of the network. Compared with CenterNet and SSD (Single Shot Multibox Detector), the comprehensive performance, detection accuracy, model capacity and running speed of the model were compared. The results showed that the average precision, error rate and miss rate of the proposed model were 88.9%, 10.9% and 5.8%, respectively, and its model volume and frame rate were 14.2MB and 8.1fps. The proposed model is of strong environmental adaptability and has a good detection effect under the circumstance of various light, different occlusion, different fruits’ distance and number. By comparing the performance of the accuracy with the CenterNet and the SSD models, the results showed that the proposed model was only 1/4 of the size of CenterNet model while has comparable detection accuracy. Compared with the SSD model, the average precision of the proposed model increased by 3.9%, and the model volume decreased by 84.3%. The proposed model runs almost twice as fast using CPU than the CenterNet and SSD models. This study provided a new approach for the research of lightweight model in fruit detection with orchard mobile platform under unstructured environment.

Key words: machine vision, deep learning, lightweight network, anchor-free, apple detection

CLC Number: