Resumen
Scene or place classification is one of the important problems in image and video search and recommendation systems. Humans can understand the scene they are located, but it is difficult for machines to do it. Considering a scene image which has several objects, humans recognize the scene based on these objects, especially background objects. According to this observation, we propose an efficient scene classification algorithm for three different classes by detecting objects in the scene. We use pre-trained semantic segmentation model to extract objects from an image. After that, we construct a weight matrix to determine a scene class better. Finally, we classify an image into one of three scene classes (i.e., indoor, nature, city) by using the designed weighting matrix. The performance of our scheme outperforms several classification methods using convolutional neural networks (CNNs), such as VGG, Inception, ResNet, ResNeXt, Wide-ResNet, DenseNet, and MnasNet. The proposed model achieves 90.8% of verification accuracy and improves over 2.8% of the accuracy when comparing to the existing CNN-based methods.