Resumen
Smart agriculture relies on accurate yield maps as a crucial tool for decision-making. Many yield maps, however, suffer from spatial errors that can compromise the quality of their data, while several approaches have been proposed to address some of these errors, detecting voids or holes in the maps remains challenging. Additionally, the quality of yield datasets is typically evaluated based on root mean squared errors after interpolation. This evaluation method relies on weighbridge data, which can occasionally be inaccurate, impacting the quality of decisions made using the datasets. This paper introduces a novel algorithm designed to identify voids in yield maps. Furthermore, it maps three types of spatial errors (GPS errors, yield surges, and voids) to two standard data quality dimensions (accuracy and completeness). Doing so provides a quality score that can be utilized to assess the quality of yield datasets, eliminating the need for weighbridge data. The paper carries out three types of evaluations: (1) evaluating the algorithm?s efficacy by applying it to a dataset containing fields with and without voids; (2) assessing the benefits of integrating void detection and other spatial error identification techniques into the yield data processing chain; and (3) examining the correlation between root mean squared error and the proposed quality score before and after filtering out spatial errors. The results of the evaluations demonstrate that the proposed algorithm achieves a 100% sensitivity, 91% specificity, and 82% accuracy in identifying yield maps with voids. Additionally, there is a decrease in the root mean squared error when various spatial errors, including voids after applying the proposed data pre-processing chain. The inverse correlation observed between the root mean squared error and the proposed quality score (-0.577 and -0.793, before and after filtering spatial errors, respectively) indicates that the quality score can effectively assess the quality of yield datasets. This assessment enables seamless integration into real-time big data quality assessment solutions based on various data quality dimensions.