ARTÍCULO
TITULO

Capturing and Characterizing Human Activities Using Building Locations in America

Zheng Ren    
Bin Jiang and Stefan Seipel    

Resumen

Capturing and characterizing collective human activities in a geographic space have become much easier than ever before in the big era. In the past few decades it has been difficult to acquire the spatiotemporal information of human beings. Thanks to the boom in the use of mobile devices integrated with positioning systems and location-based social media data, we can easily acquire the spatial and temporal information of social media users. Previous studies have successfully used street nodes and geo-tagged social media such as Twitter to predict users? activities. However, whether human activities can be well represented by social media data remains uncertain. On the other hand, buildings or architectures are permanent and reliable representations of human activities collectively through historical footprints. This study aims to use the big data of US building footprints to investigate the reliability of social media users for human activity prediction. We created spatial clusters from 125 million buildings and 1.48 million Twitter points in the US. We further examined and compared the spatial and statistical distribution of clusters at both country and city levels. The result of this study shows that both building and Twitter data spatial clusters show the scaling pattern measured by the scale of spatial clusters, respectively, characterized by the number points inside clusters and the area of clusters. More specifically, at the country level, the statistical distribution of the building spatial clusters fits power law distribution. Inside the four largest cities, the hotspots are power-law-distributed with the power law exponent around 2.0, meaning that they also follow the Zipf?s law. The correlations between the number of buildings and the number of tweets are very plausible, with the r square ranging from 0.53 to 0.74. The high correlation and the similarity of two datasets in terms of spatial and statistical distribution suggest that, although social media users are only a proportion of the entire population, the spatial clusters from geographical big data is a good and accurate representation of overall human activities. This study also indicates that using an improved method for spatial clustering is more suitable for big data analysis than the conventional clustering methods based on Euclidean geometry.