Resumen
Violent attacks have been one of the hot issues in recent years. In the presence of closed-circuit televisions (CCTVs) in smart cities, there is an emerging challenge in apprehending criminals, leading to a need for innovative solutions. In this paper, the propose a model aimed at enhancing real-time emergency response capabilities and swiftly identifying criminals. This initiative aims to foster a safer environment and better manage criminal activity within smart cities. The proposed architecture combines an image-to-image stable diffusion model with violence detection and pose estimation approaches. The diffusion model generates synthetic data while the object detection approach uses YOLO v7 to identify violent objects like baseball bats, knives, and pistols, complemented by MediaPipe for action detection. Further, a long short-term memory (LSTM) network classifies the action attacks involving violent objects. Subsequently, an ensemble consisting of an edge device and the entire proposed model is deployed onto the edge device for real-time data testing using a dash camera. Thus, this study can handle violent attacks and send alerts in emergencies. As a result, our proposed YOLO model achieves a mean average precision (MAP) of 89.5% for violent attack detection, and the LSTM classifier model achieves an accuracy of 88.33% for violent action classification. The results highlight the model?s enhanced capability to accurately detect violent objects, particularly in effectively identifying violence through the implemented artificial intelligence system.