Resumen
In this work, we present a novel approach to understand the quality of public transit system in resource constrained regions using user-generated contents. With growing urban population, it is getting difficult to manage travel demand in an effective way. This problem is more prevalent in developing cities due to lack of budget and proper surveillance system. Due to resource constraints, developing cities have limited infrastructure to monitor transport services. To improve the quality and patronage of public transit system, authorities often use manual travel surveys. But manual surveys often suffer from quality issues. For example, respondents may not provide all the detailed travel information in a manual travel survey. The survey may have sampling bias. Due to close-ended design (specific questions in the questionnaire), lots of relevant information may not be captured in a manual survey process. To address these issues, we investigated if user-generated contents, for example, Twitter data, can be used to understand service quality in Greater Mumbai in India, which can complement existing manual survey process. To do this, we assumed that, if a tweet is relevant to public transport system and contains negative sentiment, then that tweet expresses user?s dissatisfaction towards the public transport service. Since most of the tweets do not have any explicit geolocation, we also presented a model that does not only extract users? dissatisfaction towards public transit system but also retrieves the spatial context of dissatisfaction and the potential causes that affect the service quality. It is observed that a Random Forest-based model outperforms other machine learning models, while yielding 0.97 precision and 0.88 F1-score.