Resumen
Hyperspectral remote sensing can be used to effectively identify contaminated elements in soil. However, in the field of monitoring soil heavy metal pollution, hyperspectral remote sensing has the characteristics of high dimensionality and high redundancy, which seriously affect the accuracy and stability of hyperspectral inversion models. To resolve the problem, a gradient boosting regression tree (GBRT) hyperspectral inversion algorithm for heavy metal (Arsenic (As)) content in soils based on Spearman?s rank correlation analysis (SCA) coupled with competitive adaptive reweighted sampling (CARS) is proposed in this paper. Firstly, the CARS algorithm is used to roughly select the original spectral data. Second derivative (SD), Gaussian filtering (GF), and min-max normalization (MMN) pretreatments are then used to improve the correlation between the spectra and As in the characteristic band enhancement stage. Finally, the low-correlation bands are removed using the SCA method, and a subset with absolute correlation values greater than 0.6 is retained as the optimal band subset after each pretreatment. For the modeling, the five most representative characteristic bands were selected in the Honghu area of China, and the nine most representative characteristic bands were selected in the Daye area of China. In order to verify the generalization ability of the proposed algorithm, 92 soil samples from the Honghu and Daye areas were selected as the research objects. With the use of support vector machine regression (SVMR), linear regression (LR), and random forest (RF) regression methods as comparative methods, all the models obtained a good prediction accuracy. However, among the different combinations, CARS-SCA-GBRT obtained the highest precision, which indicates that the proposed algorithm can select fewer characteristic bands to achieve a better inversion effect, and can thus provide accurate data support for the treatment and recovery of heavy metal pollution in soils.