Resumen
This paper addresses the regression modeling of local environmental pollution levels for electric power industry needs, which is fundamental for the proper design and maintenance of high-voltage transmission lines and insulators in order to prevent various hazards, such as accidental flashovers due to pollution and the resultant power outages. The primary goal of our study was to increase the precision of regression models for this application area by exploiting additional input attributes extracted from satellite imagery and adjusting the modeling methodology. Given that thousands of different attributes can be extracted from satellite images, of which only a few are likely to contain useful information, we also explored suitable feature selection procedures. We show that a suitable combination of attribute selection methods (relief, FSRF-Test, and forward selection), regression models (random forest models and M5P regression trees), and modeling methodology (estimating field-measured values of target variables rather than their upper bounds) can significantly increase the total modeling accuracy, measured by the correlation between the estimated and the true values of target variables. Specifically, the accuracies of our regression models dramatically rose from 0.12?0.23 to 0.40?0.64, while their relative absolute errors were conversely reduced (e.g., from 1.04 to 0.764 for the best model).