Inicio  /  Applied Sciences  /  Vol: 12 Par: 13 (2022)  /  Artículo
ARTÍCULO
TITULO

Dual-Modal Transformer with Enhanced Inter- and Intra-Modality Interactions for Image Captioning

Deepika Kumar    
Varun Srivastava    
Daniela Elena Popescu and Jude D. Hemanth    

Resumen

Image captioning is oriented towards describing an image with the best possible use of words that can provide a semantic, relatable meaning of the scenario inscribed. Different models can be used to accomplish this arduous task depending on the context and requirement of what needs to be achieved. An encoder?decoder model which uses the image feature vectors as an input to the encoder is often marked as one of the appropriate models to accomplish the captioning process. In the proposed work, a dual-modal transformer has been used which captures the intra- and inter-model interactions in a simultaneous manner within an attention block. The transformer architecture is quantitatively evaluated on a publicly available Microsoft Common Objects in Context (MS COCO) dataset yielding a Bilingual Evaluation Understudy (BLEU)-4 Score of 85.01. The efficacy of the model is evaluated on Flickr 8k, Flickr 30k datasets and MS COCO datasets and results for the same is compared and analysed with the state-of-the-art methods. The results shows that the proposed model outperformed when compared with conventional models, such as the encoder?decoder model and attention model.

 Artículos similares

       
 
Viktar Atliha and Dmitrij ?e?ok    
Image captioning is a very important task, which is on the edge between natural language processing (NLP) and computer vision (CV). The current quality of the captioning models allows them to be used for practical tasks, but they require both large compu... ver más
Revista: Applied Sciences

 
Wenjie Cai, Zheng Xiong, Xianfang Sun, Paul L. Rosin, Longcun Jin and Xinyi Peng    
Image captioning is the task of generating textual descriptions of images. In order to obtain a better image representation, attention mechanisms have been widely adopted in image captioning. However, in existing models with detection-based attention, th... ver más
Revista: Applied Sciences

 
Boeun Kim, Saim Shin and Hyedong Jung    
Image captioning is a promising research topic that is applicable to services that search for desired content in a large amount of video data and a situation explanation service for visually impaired people. Previous research on image captioning has been... ver más
Revista: Applied Sciences

 
Sreela S R and Sumam Mary Idicula    
Due to the rapid growth of deep learning technologies, automatic image description generation is an interesting problem in computer vision and natural language generation. It helps to improve access to photo collections on social media and gives guidance... ver más
Revista: Information