Resumen
The land-use identification process, which involves quantifying the types and intensity of human activities at a regional level, is a critical investigation step for ongoing land-use planning. One limitation of land-use identification practices is that they are based on theoretical-driven models using survey and socioeconomic data, which are often considered costly and time consuming. Another limitation is that most of these identification methods cannot incorporate the effect of daily human activity, resulting in some significant spatial heterogeneity being ignored. In this context, a novel land-use identification framework is proposed to quantify land-use characteristics using traffic-flow and traffic-events data. Regarding the identification models, two widely used Ensemble learning methods: Random Forest and Adaboost, are introduced to classify the land-use type and fit the land-use density. The case study collected the transit vehicle positions, traffic events, and geo-tagged data at the regional level in the San Francisco Bay Area, California. The results demonstrated that this framework with Ensemble learning was significantly accurate at identifying land-use characteristics in both the type classification and density regression tasks. The result averages improved 12.63%, 12.84%, 11.05%, 5.44%, 12.84% for Area Under ROC Curve (AUC), Classification Accuracy (CA), F-Measure (F1), Precision, and Recall, respectively, in classification tasks and 56.81%, 21.20%, 47.29% for Mean Squared Error (MSE), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE), respectively, in regression tasks than other models. The Random Forest model performs better in labels with high regularity, such as education, residence, and work activities. Apart from the accuracy, the correlation analysis of the error term also showed that the result was consistent with people?s common sense of land-use characteristics, demonstrating the interpretability of the proposed framework.