Resumen
The data generated by social media such as Twitter are classified as big data and the usability of those data can provide a wide range of resources to various study areas including disaster management, tourism, political science, and health. However, apart from the acquisition of the data, the reliability and accuracy when it comes to using it concern scientists in terms of whether or not the use of social media data (SMD) can lead to incorrect and unreliable inferences. There have been many studies on the analyses of SMD in order to investigate their reliability, accuracy, or credibility, but that have not dealt with the filtering techniques applied to with the data before creating the results or after their acquisition. This study provides a methodology for detecting the accuracy and reliability of the filtering techniques for SMD and then a spatial similarity index that analyzes spatial intersections, proximity, and size, and compares them. Finally, we offer a comparison that shows the best combination of filtering techniques and similarity indices to create event maps of SMD by using the Getis-Ord Gi* technique. The steps of this study can be summarized as follows: an investigation of domain-based text filtering techniques for dealing with sentiment lexicons, machine learning-based sentiment analyses on reliability, and developing intermediate codes specific to domain-based studies; then, by using various similarity indices, the determination of the spatial reliability and accuracy of maps of the filtered social media data. The study offers the best combination of filtering, mapping, and spatial accuracy investigation methods for social media data, especially in the case of emergencies, where urgent spatial information is required. As a result, a new similarity index based on the spatial intersection, spatial size, and proximity relationships is introduced to determine the spatial accuracy of the fine-filtered SMD. The motivation for this research is to develop the ability to create an incidence map shortly after a disaster event such as a bombing. However, the proposed methodology can also be used for various domains such as concerts, elections, natural disasters, marketing, etc.