Resumen
Camellia oleifera fruits are randomly distributed in an orchard, and the fruits are easily blocked or covered by leaves. In addition, the colors of leaves and fruits are alike, and flowers and fruits grow at the same time, presenting many ambiguities. The large shock force will cause flowers to fall and affect the yield. As a result, accurate positioning becomes a difficult problem for robot picking. Therefore, studying target recognition and localization of Camellia oleifera fruits in complex environments has many difficulties. In this paper, a fusion method of deep learning based on visual perception and image processing is proposed to adaptively and actively locate fruit recognition and picking points for Camellia oleifera fruits. First, to adapt to the target classification and recognition of complex scenes in the field, the parameters of the You Only Live Once v7 (YOLOv7) model were optimized and selected to achieve Camellia oleifera fruits? detection and determine the center point of the fruit recognition frame. Then, image processing and a geometric algorithm are used to process the image, segment, and determine the morphology of the fruit, extract the centroid of the outline of Camellia oleifera fruit, and then analyze the position deviation of its centroid point and the center point in the YOLO recognition frame. The frontlighting, backlight, partial occlusion, and other test conditions for the perceptual recognition processing were validated with several experiments. The results demonstrate that the precision of YOLOv7 is close to that of YOLOv5s, and the mean average precision of YOLOv7 is higher than that of YOLOv5s. For some occluded Camellia oleifera fruits, the YOLOv7 algorithm is better than the YOLOv5s algorithm, which improves the detection accuracy of Camellia oleifera fruits. The contour of Camellia oleifera fruits can be extracted entirely via image processing. The average position deviation between the centroid point of the image extraction and the center point of the YOLO recognition frame is 2.86 pixels; thus, the center point of the YOLO recognition frame is approximately considered to be consistent with the centroid point of the image extraction.