Resumen
Web text, using natural language to describe a disaster event, contains a considerable amount of disaster information. Automatic extraction from web text of this disaster information (e.g., time, location, casualties, and disaster losses) is an important supplement to conventional disaster monitoring data. This study extracted and compared the characteristics of earthquake disaster information from web news media reports (news reports) and online disaster reduction agency reports (professional reports). Using earthquakes in China from 2015 to 2017 as a case study, a series of rules were created for extracting earthquake event information, including temporal extraction rules, a location trigger dictionary, and an attribute trigger dictionary. The differences in characteristics of news reports and professional reports were investigated in terms of their quantity and spatiotemporal distribution through statistical analysis, geocoding, and kernel density estimation. The information extracted from each set of reports was also compared with authoritative data. The results indicated that news reports are more extensive and have richer information. In contrast, professional reports are less repetitive as well as more accurate and standardized, mainly focusing on earthquakes with Ms = 4 and/or earthquakes that may cause damage. These characteristics of disaster information from different web texts sources can be used to improve the efficiency and analysis of disaster information extraction. In addition, the rule-based approach proposed herein was found to be an accurate and viable way to extract earthquake information from web texts. The approach provided the technical basics and background information to support further research seeking human-centric disaster information, which cannot be acquired using traditional instrument monitoring methods, from web text.