Resumen
Since the recent outbreak of COVID-19, many scientists have started working on distinct challenges related to mining the available large datasets from social media as an effective asset to understand people?s responses to the pandemic. This study presents a comprehensive social data mining approach to provide in-depth insights related to the COVID-19 pandemic and applied to the Arabic language. We first developed a technique to infer geospatial information from non-geotagged Arabic tweets. Secondly, a sentiment analysis mechanism at various levels of spatial granularities and separate topic scales is introduced. We applied sentiment-based classifications at various location resolutions (regions/countries) and separate topic abstraction levels (subtopics and main topics). In addition, a correlation-based analysis of Arabic tweets and the official health providers? data will be presented. Moreover, we implemented several mechanisms of topic-based analysis using occurrence-based and statistical correlation approaches. Finally, we conducted a set of experiments and visualized our results based on a combined geo-social dataset, official health records, and lockdown data worldwide. Our results show that the total percentage of location-enabled tweets has increased from 2% to 46% (about 2.5M tweets). A positive correlation between top topics (lockdown and vaccine) and the COVID-19 new cases has also been recorded, while negative feelings of Arab Twitter users were generally raised during this pandemic, on topics related to lockdown, closure, and law enforcement.