Image-Captioning Model Compression

Viktar Atliha and Dmitrij ?e?ok

Resumen

Image captioning is a very important task, which is on the edge between natural language processing (NLP) and computer vision (CV). The current quality of the captioning models allows them to be used for practical tasks, but they require both large computational power and considerable storage space. Despite the practical importance of the image-captioning problem, only a few papers have investigated model size compression in order to prepare them for use on mobile devices. Furthermore, these works usually only investigate decoder compression in a typical encoder?decoder architecture, while the encoder traditionally occupies most of the space. We applied the most efficient model-compression techniques such as architectural changes, pruning and quantization to several state-of-the-art image-captioning architectures. As a result, all of these models were compressed by no less than 91% in terms of memory (including encoder), but lost no more than 2% and 4.5% in metrics such as CIDEr and SPICE, respectively. At the same time, the best model showed results of 127.4 CIDEr and 21.4 SPICE, with a size equal to only 34.8 MB, which sets a strong baseline for compression problems for image-captioning models, and could be used for practical applications.

Palabras claves

image captioning - model compression - pruning - quantization

Acceso

PÁGINAS

pp. 0 - 0

NÚMERO

Volumen: 12 Parte: 3 (2022)

MATERIAS

INGENIERÍA Y CONSTRUCCIÓN CIVIL
TECNOLOGÍA

REVISTAS SIMILARES

Applied Sciences
Algorithms
Information

DOI

https://doi.org/10.3390/app12031638

Artículos similares

Panoptic Segmentation-Based Attention for Image Captioning

Acceso

Wenjie Cai, Zheng Xiong, Xianfang Sun, Paul L. Rosin, Longcun Jin and Xinyi Peng

Image captioning is the task of generating textual descriptions of images. In order to obtain a better image representation, attention mechanisms have been widely adopted in image captioning. However, in existing models with detection-based attention, th... ver más

Revista: Applied Sciences

Variational Autoencoder-Based Multiple Image Captioning Using a Caption Attention Map

Acceso

Boeun Kim, Saim Shin and Hyedong Jung

Image captioning is a promising research topic that is applicable to services that search for desired content in a large amount of video data and a situation explanation service for visually impaired people. Previous research on image captioning has been... ver más

Revista: Applied Sciences

Dense Model for Automatic Image Description Generation with Game Theoretic Optimization

Acceso

Sreela S R and Sumam Mary Idicula

Due to the rapid growth of deep learning technologies, automatic image description generation is an interesting problem in computer vision and natural language generation. It helps to improve access to photo collections on social media and gives guidance... ver más

Revista: Information

Revistas destacadas

Acceso directo a los números publicados en la revista Infrastructures

Infrastructures

Acceso directo a los números publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los números publicados en la revista BiT

Acceso directo a los números publicados en la revista Revista de la Construcción

Revista de la Construcción

Ver todas las revistas