REVISTA
Future Internet

TODAS

Redirigiendo al acceso original de articulo en 17 segundos...

Inicio / Future Internet / Vol: 13 Par: 2 (2021) / Artículo

ARTÍCULO

TITULO

Video Captioning Based on Channel Soft Attention and Semantic Reconstructor

Zhou Lei and Yiyong Huang

Resumen

Video captioning is a popular task which automatically generates a natural-language sentence to describe video content. Previous video captioning works mainly use the encoder?decoder framework and exploit special techniques such as attention mechanisms to improve the quality of generated sentences. In addition, most attention mechanisms focus on global features and spatial features. However, global features are usually fully connected features. Recurrent convolution networks (RCNs) receive 3-dimensional features as input at each time step, but the temporal structure of each channel at each time step has been ignored, which provide temporal relation information of each channel. In this paper, a video captioning model based on channel soft attention and semantic reconstructor is proposed, which considers the global information for each channel. In a video feature map sequence, the same channel of every time step is generated by the same convolutional kernel. We selectively collect the features generated by each convolutional kernel and then input the weighted sum of each channel to RCN at each time step to encode video representation. Furthermore, a semantic reconstructor is proposed to rebuild semantic vectors to ensure the integrity of semantic information in the training process, which takes advantage of both forward (semantic to sentence) and backward (sentence to semantic) flows. Experimental results on popular datasets MSVD and MSR-VTT demonstrate the effectiveness and feasibility of our model.

Palabras claves

video captioning - channel soft attention - semantic reconstructor - recurrent convolution networks

Acceso

PÁGINAS

pp. 0 - 0

NÚMERO

Volumen: 13 Parte: 2 (2021)

MATERIAS

INFRAESTRUCTURA

REVISTAS SIMILARES

Water
Buildings
Future Internet

DOI

https://doi.org/10.3390/fi13020055

Artículos similares

FishSeg: 3D Fish Tracking Using Mask R-CNN in Large Ethohydraulic Flumes

Acceso

Fan Yang, Anita Moldenhauer-Roth, Robert M. Boes, Yuhong Zeng and Ismail Albayrak

To study the fish behavioral response to up- and downstream fish passage structures, live-fish tests are conducted in large flumes in various laboratories around the world. The use of multiple fisheye cameras to cover the full width and length of a flume... ver más

Revista: Water

Distributed Bandwidth Allocation Strategy for QoE Fairness of Multiple Video Streams in Bottleneck Links

Acceso

Yazhi Liu, Dongyu Wei, Chunyang Zhang and Wei Li

In QoE fairness optimization of multiple video streams, a distributed video stream fairness scheduling strategy based on federated deep reinforcement learning is designed to address the problem of low bandwidth utilization due to unfair bandwidth allocat... ver más

Revista: Future Internet

Evaluation of Online Teaching Quality Based on Facial Expression Recognition

Acceso

Changbo Hou, Jiajun Ai, Yun Lin, Chenyang Guan, Jiawen Li and Wenyu Zhu

In 21st-century society, with the rapid development of information technology, the scientific and technological strength of all walks of life is increasing, and the field of education has also begun to introduce high and new technologies gradually. Affec... ver más

Revista: Future Internet

Synthesizing a Talking Child Avatar to Train Interviewers Working with Maltreated Children

Acceso

Pegah Salehi, Syed Zohaib Hassan, Myrthe Lammerse, Saeed Shafiee Sabet, Ingvild Riiser, Ragnhild Klingenberg Røed, Miriam S. Johnson, Vajira Thambawita, Steven A. Hicks, Martine Powell, Michael E. Lamb, Gunn Astrid Baugerud, Pål Halvorsen and Michael A. Riegler

When responding to allegations of child sexual, physical, and psychological abuse, Child Protection Service (CPS) workers and police personnel need to elicit detailed and accurate accounts of the abuse to assist in decision-making and prosecution. Curren... ver más

Revista: Big Data and Cognitive Computing

Reducing Videoconferencing Fatigue through Facial Emotion Recognition

Acceso

Jannik Rößler, Jiachen Sun and Peter Gloor

In the last 14 months, COVID-19 made face-to-face meetings impossible and this has led to rapid growth in videoconferencing. As highly social creatures, humans strive for direct interpersonal interaction, which means that in most of these video meetings ... ver más

Revista: Future Internet

Revistas destacadas

Acceso directo a los números publicados en la revista Infrastructures

Infrastructures

Acceso directo a los números publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los números publicados en la revista BiT

Acceso directo a los números publicados en la revista Revista de la Construcción

Revista de la Construcción

Ver todas las revistas