Resumen
Convolutional neural networks (CNN) have achieved excellent results in the field of image recognition that classifies objects in images. A typical CNN consists of a deep architecture that uses a large number of weights and layers to achieve high performance. CNN requires relatively large memory space and computational costs, which not only increase the time to train the model but also limit the real-time application of the trained model. For this reason, various neural network compression methodologies have been studied to efficiently use CNN in small embedded hardware such as mobile and edge devices. In this paper, we propose a kernel density estimation based non-uniform quantization methodology that can perform compression efficiently. The proposed method performs efficient weights quantization using a significantly smaller number of sampled weights than the number of original weights. Four-bit quantization experiments on the classification of the ImageNet dataset with various CNN architectures show that the proposed methodology can perform weights quantization efficiently in terms of computational costs without significant reduction in model performance.