Resumen
The concentration of PM2.5 is an important index to measure the degree of air pollution. When it exceeds the standard value, it is considered to cause pollution and lower the air quality, which is harmful to human health and can cause a variety of diseases, i.e., asthma, chronic bronchitis, etc. Therefore, the prediction of PM2.5 concentration is helpful to reduce its harm. In this paper, a hybrid model called CNN-BiLSTM-Attention is proposed to predict the PM2.5 concentration over the next two days. First, we select the PM2.5 concentration data in hours from January 2013 to February 2017 of Shunyi District, Beijing. The auxiliary data includes air quality data and meteorological data. We use the sliding window method for preprocessing and dividing the corresponding data into a training set, a validation set, and a test set. Second, CNN-BiLSTM-Attention is composed of the convolutional neural network, bidirectional long short-term memory neural network, and attention mechanism. The parameters of this network structure are determined by the minimum error in the training process, including the size of the convolution kernel, activation function, batch size, dropout rate, learning rate, etc. We determine the feature size of the input and output by evaluating the performance of the model, finding out the best output for the next 48 h. Third, in the experimental part, we use the test set to check the performance of the proposed CNN-BiLSTM-Attention on PM2.5 prediction, which is compared by other comparison models, i.e., lasso regression, ridge regression, XGBOOST, SVR, CNN-LSTM, and CNN-BiLSTM. We conduct short-term prediction (48 h) and long-term prediction (72 h, 96 h, 120 h, 144 h), respectively. The results demonstrate that even the predictions of the next 144 h with CNN-BiLSTM-Attention is better than the predictions of the next 48 h with the comparison models in terms of mean absolute error (MAE), root mean square error (RMSE), and coefficient of determination (R2).