Redirigiendo al acceso original de articulo en 17 segundos...
Inicio  /  Future Internet  /  Vol: 13 Par: 2 (2021)  /  Artículo
ARTÍCULO
TITULO

Video Captioning Based on Channel Soft Attention and Semantic Reconstructor

Zhou Lei and Yiyong Huang    

Resumen

Video captioning is a popular task which automatically generates a natural-language sentence to describe video content. Previous video captioning works mainly use the encoder?decoder framework and exploit special techniques such as attention mechanisms to improve the quality of generated sentences. In addition, most attention mechanisms focus on global features and spatial features. However, global features are usually fully connected features. Recurrent convolution networks (RCNs) receive 3-dimensional features as input at each time step, but the temporal structure of each channel at each time step has been ignored, which provide temporal relation information of each channel. In this paper, a video captioning model based on channel soft attention and semantic reconstructor is proposed, which considers the global information for each channel. In a video feature map sequence, the same channel of every time step is generated by the same convolutional kernel. We selectively collect the features generated by each convolutional kernel and then input the weighted sum of each channel to RCN at each time step to encode video representation. Furthermore, a semantic reconstructor is proposed to rebuild semantic vectors to ensure the integrity of semantic information in the training process, which takes advantage of both forward (semantic to sentence) and backward (sentence to semantic) flows. Experimental results on popular datasets MSVD and MSR-VTT demonstrate the effectiveness and feasibility of our model.

 Artículos similares

       
 
Fan Yang, Anita Moldenhauer-Roth, Robert M. Boes, Yuhong Zeng and Ismail Albayrak    
To study the fish behavioral response to up- and downstream fish passage structures, live-fish tests are conducted in large flumes in various laboratories around the world. The use of multiple fisheye cameras to cover the full width and length of a flume... ver más
Revista: Water

 
Yazhi Liu, Dongyu Wei, Chunyang Zhang and Wei Li    
In QoE fairness optimization of multiple video streams, a distributed video stream fairness scheduling strategy based on federated deep reinforcement learning is designed to address the problem of low bandwidth utilization due to unfair bandwidth allocat... ver más
Revista: Future Internet

 
Changbo Hou, Jiajun Ai, Yun Lin, Chenyang Guan, Jiawen Li and Wenyu Zhu    
In 21st-century society, with the rapid development of information technology, the scientific and technological strength of all walks of life is increasing, and the field of education has also begun to introduce high and new technologies gradually. Affec... ver más
Revista: Future Internet

 
Pegah Salehi, Syed Zohaib Hassan, Myrthe Lammerse, Saeed Shafiee Sabet, Ingvild Riiser, Ragnhild Klingenberg Røed, Miriam S. Johnson, Vajira Thambawita, Steven A. Hicks, Martine Powell, Michael E. Lamb, Gunn Astrid Baugerud, Pål Halvorsen and Michael A. Riegler    
When responding to allegations of child sexual, physical, and psychological abuse, Child Protection Service (CPS) workers and police personnel need to elicit detailed and accurate accounts of the abuse to assist in decision-making and prosecution. Curren... ver más

 
Jannik Rößler, Jiachen Sun and Peter Gloor    
In the last 14 months, COVID-19 made face-to-face meetings impossible and this has led to rapid growth in videoconferencing. As highly social creatures, humans strive for direct interpersonal interaction, which means that in most of these video meetings ... ver más
Revista: Future Internet