Inicio  /  Applied Sciences  /  Vol: 10 Par: 19 (2020)  /  Artículo
ARTÍCULO
TITULO

Learn and Tell: Learning Priors for Image Caption Generation

Pei Liu    
Dezhong Peng and Ming Zhang    

Resumen

In this work, we propose a novel priors-based attention neural network (PANN) for image captioning, which aims at incorporating two kinds of priors, i.e., the probabilities being mentioned for local region proposals (PBM priors) and part-of-speech clues for caption words (POS priors), into a visual information extraction process at each word prediction. This work was inspired by the intuitions that region proposals have different inherent probabilities for image captioning, and that the POS clues bridge the word class (part-of-speech tag) with the categories of visual features. We propose new methods to extract these two priors, in which the PBM priors are obtained by computing the similarities between the caption feature vector and local feature vectors, while the POS priors are predicated at each step of word generation by taking the hidden state of the decoder as input. After that, these two kinds of priors are further incorporated into the PANN module of the decoder to help the decoder extract more accurate visual information for the current word generation. In our experiments, we qualitatively analyzed the proposed approach and quantitatively evaluated several captioning schemes with our PANN on the MS-COCO dataset. Experimental results demonstrate that our proposed method could achieve better performance as well as the effectiveness of the proposed network for image captioning.

 Artículos similares

       
 
Yugen Yi, Haoming Zhang, Ningyi Zhang, Wei Zhou, Xiaomei Huang, Gengsheng Xie and Caixia Zheng    
As the feature dimension of data continues to expand, the task of selecting an optimal subset of features from a pool of limited labeled data and extensive unlabeled data becomes more and more challenging. In recent years, some semi-supervised feature se... ver más
Revista: Information

 
Jie Ren, Changmiao Li, Yaohui An, Weichuan Zhang and Changming Sun    
Few-shot fine-grained image classification (FSFGIC) methods refer to the classification of images (e.g., birds, flowers, and airplanes) belonging to different subclasses of the same species by a small number of labeled samples. Through feature representa... ver más
Revista: AI

 
Diya Wang, Yonglin Zhang, Lixin Wu, Yupeng Tai, Haibin Wang, Jun Wang, Fabrice Meriaudeau and Fan Yang    
In recent years, the study of deep learning techniques for underwater acoustic channel estimation has gained widespread attention. However, existing neural network channel estimation methods often overfit to training dataset noise levels, leading to dimi... ver más

 
Jie Zhang, Fan Li, Xin Zhang, Yue Cheng and Xinhong Hei    
As a crucial task for disease diagnosis, existing semi-supervised segmentation approaches process labeled and unlabeled data separately, ignoring the relationships between them, thereby limiting further performance improvements. In this work, we introduc... ver más
Revista: Applied Sciences

 
Zihang Xu and Chiawei Chu    
Ensuring the sustainability of transportation infrastructure for electric vehicles (e-trans) is increasingly imperative in the pursuit of decarbonization goals and addressing the pressing energy shortage. By prioritizing the development and maintenance o... ver más
Revista: Applied Sciences