Resumen
The ?Kyoho? (Vitis labruscana) grape is one of the mainly fresh fruits; it is important to accurately segment the grape bunch and to detect its maturity level for the construction of an intelligent grape orchard. Grapes in the natural environment have different shapes, occlusion, complex backgrounds, and varying illumination; this leads to poor accuracy in grape maturity detection. In this paper, an improved Mask RCNN-based algorithm was proposed by adding attention mechanism modules to establish a grape bunch segmentation and maturity level detection model. The dataset had 656 grape bunches of different backgrounds, acquired from a grape growing environment of natural conditions. This dataset was divided into four groups according to maturity level. In this study, we first compared different grape bunch segmentation and maturity level detection models established with YoloV3, Solov2, Yolact, and Mask RCNN to select the backbone network. By comparing the performances of the different models established with these methods, Mask RCNN was selected as the backbone network. Then, three different attention mechanism modules, including squeeze-and-excitation attention (SE), the convolutional block attention module (CBAM), and coordinate attention (CA), were introduced to the backbone network of the ResNet50/101 in Mask RCNN, respectively. The results showed that the mean average precision (mAP) and mAP0.75 and the average accuracy of the model established with ResNet101 + CA reached 0.934, 0.891, and 0.944, which were 6.1%, 4.4%, and 9.4% higher than the ResNet101-based model, respectively. The error rate of this model was 5.6%, which was less than the ResNet101-based model. In addition, we compared the performances of the models established with MASK RCNN, adding different attention mechanism modules. The results showed that the mAP and mAP0.75 and the accuracy for the Mask RCNN50/101 + CA-based model were higher than those of the Mask RCNN50/101 + SE- and Mask RCNN50/101 + CBAM-based models. Furthermore, the performances of the models constructed with different network layers of ResNet50- and ResNet101-based attention mechanism modules in a combination method were compared. The results showed that the performance of the ResNet101-based combination with CA model was better than the ResNet50-based combination with CA model. The results showed that the proposed model of Mask RCNN ResNet101 + CA was good for capturing the features of a grape bunch. The proposed model has practical significance for the segmentation of grape bunches and the evaluation of the grape maturity level, which contributes to the construction of intelligent vineyards.